

DOCUMENT RESUME 

ED 025 273 u ooi 129 

By* Libbe^* M- A-; Blunw A- R* 

A Stud^ of Informatioo Defnents for the National Information System for Physics- 
American Inst, of Physics. New York, N.Y. 

Spons Agency National Science Foondation, Washington, D-C 
Pub Date Jw 68 
Grant* NSF-CN-686 
Note* 62p. 

EORS Price MF*$a50 HC*$520 

Descriptors'^Aotomatiorw *Cataloging Data Analysis, Data Sheets, Documentation Indexing, ♦Information 
Processing ♦Information Storage, Information Systems, * Physics, Records (FormsX Standards 

The identification of information elements can provide an important tool for the 
systematic development of an information system design. A state-of-the-art survey 
reveals mounting recognition and- interest in the problem, a considerable history of 
prior efforts, but no well-defined methodology. A study in the context of a national 
information system is reported. A "trial structure" has been developed andis 
described (Author) 







I 

ni 

I 



i 

i 

1 



o 

ERIC 



'SyBBB 




















003123' Jun-,-1968 



A Study of Information Elements for the 
National Information System for Physics 



,tt\C/Ct4y 



DEC 10'68 






:^or 






■ 

;[ 

V 

I 



f 









in 

C4 

o 

o 

u 




by 

M.A. Libbey 
and 

A.R. Blum 






Information Division 
AMERICAN INSTITUTE OF PHYSICS 
335 East 45 Street, New York, New York 10017 



Report on work supported by the National 
Science Foundation under Grant No. NSF-GN 686 



U.S. DEPARTMEHT OF HEAlffl, aUWllOH & WEIFAK 
OFFICE OF QUCABOM 



fflIS DOCUMEHT HAS BEEH ttPRODUCED EXACUY AS RECEIVED FROM fflE 
person or OROAinZAllOH ORKIHAHHO IT. POIHTS OF VIEW OR OPIHIOIIS 
SIAT© 00 HOT HECESSARIU REPRESEHT OFFIOAl OFFICE OF 0OCATIOH 
POSmOHORMlICY. 




ABSTRACT 



The Identification of information elements can provide an important 
tool for the systematic development of an Information system design. A 
state-of-the-art survey reveals mounting recognition and interest in the 
problem, a considerable history of prior efforts, but no well-defined 
methodology* A study in the context of a national information system Is 
reported* A ’’trial structure" has been developed and is described* 



• • 
1 1 









f. 

! 

[ 



Preface 






;s 




One of the efforts undertaken by the American Institute of Physics 
under National Science Foundation Grant No. GN-686, "Additional Prerequisites 
for Development of a National Information System for Physics," was a study 
of appropriate information eiement structures* The effort, reported in 
this paper, was intended to provide a basis for considered decisions as 
to the nature, extent, and priority of any future information eiement stan- 
dardization effort by AlP. 



iii 



i 






3 



1 

J 



I 

-■ii 

I 



z 



I 

I 



t 





TABLE OF CONTENTS 



Abstracts 

Preface 

Table of Contents 



PART ONE; GENERAL CONSIDERATIONS 
by Miles A. Libbey 



I. Introduction 

II. A Tool for Systems Analysis and Development 

III. Background 

IV. Definitions 

V. Alternatives 



A. Objects 

B. Products and Coverages 



Vl. Method of Approach 



PART TWO: INFORMATION ELEMENTS FOR PHYSICS 

by Arthur R. Blum 



VII. Implementation 

VIII. The Data Element Structure 



A. Structuring of Terms 

B. Term Levels 

C> Classification 

IX. The Three AlP Sectors 

A. Bibliographic Data Elements 

B. AiP Files, Records and Resources 

C. System Analysis Vocabulary 

X. The Experimental Data Element File 

A. Overall Features 

B. Detailed Components 

C. Criteria 



X). 



Recommendations and Conclusions 



APPENDICES 



1. Sample List of Data Elements In Experimental File 

2. Data Element Description Sheet (Current) 

3* Data Element Description Sheet (Previous) 

4. Structuring Display of Bibliographic Data Elements 
5* Occurrence of Common Bibliographic Data Elements 
6. Standardization of Data Elements 
References 



I . Introduction 



To design an information system in which, in general, the flow of 
Information (e.g., printing, writing* acoustic or electrical waveforms) 
is to he mediated by humans, it may only be necessary to note, tautolog- 
ically, that "information flows in an information system." If, however, 
any significant role is to be assigned to digital computers, this will 
normally not suffice. Where a computer is involved, any elements of 
"meaning" which it is to receive from a human source by any means* and 
which it is to retain in fact within itself, must be utterly explicit. 

It is also a truism that in the present state of the digital computer 
art, if one machine (or automated system) is to interchange information 
with another, these elements being exchanged must be standardized explic- 
itly and in extenso * (1^ need not concern us at the moment whether or 
not this would apply to "Percept ron type" machines or systems or sophis- 
ticated translation or parsing programs.) 

Present plans of the American Institute of Physics call for use 
of computers in the process of producing primary physics publications 
by photocomposition. It is also anticipated that the byproduct tapes 
will be processed by computer for various purposes. To these extents, 
the eventual selection, definition and detailed format specification of 
units of information which are to be handled (input, processed* output) 
in the system becomes a sine qua non. This requirement was not the 
original reason for Aip*s consTTerTrTg the Information element standard- 
ization problem at this time. It did, however* assume increasing impor- 
tance as the plans for a computer Implementation of the system became 
more concrete. 

Section 11 of the report will discuss the usefulness of a standard- 
ized information element structure as a tool for systems analysis and 
development. This was the original objective of the study here reported. 

Since the general problem of information element standardization 
is not familiar to most people* some effort is made to put it in per- 
spective by discussing its background In Section HI. Section IV deals 
with definitions* a problem that has plagued efforts in this field. 

Section V presents alternatives which could be considered for an infor- 
mation element standardization program* and Section VI* on that basis, 
describes the philosophy of the approach that had been planned for AlP 
in getting started in this area. The actual implementation of the plan 
Is outlined in Section VII. The subsequent sections contain the descrip- 
tion of the resulting tentative data element set: Its structure, sources, 

and contents. 

Anyone considering the information element standardization problem 
should be warned that it seems to create more than a reasonable number of 
cases of confusion in Inter-personal communications. The problem seems 
to be mainly due to confusion of levels of terminology— undoubtedly a 
major occupational hazard In any undertaking in which words^must be used 
to talk about other words. 



Another terminology difficulty results from the. fact that many of 
the terms that must -t>e used routinely are themselves- vaguevand ambiguous. 
Such terms Include "Information," "data," and "element.'-' The- def ini t Ions 
of combinations of such terms are often even more vague and. may be down- 
right misleading. 






II. 



A Tool .for.Systems. Analysis. and Development 



Baste to any systematic approach to the design of a system is the iden- 
tification and characterization,. quantitatively if possible, of whatever will 
flow in the system and of whatever will determine or affect the flow in the 
system. In some cases this also. helps. to identify ail of the nodes and inter- 
connections between them which constitute the system. It is to be noted that 
the fact that a system is being. designed. in no way implies that the design wiil 
ipso facto be systematic. 

Ideally such identification of the flow of information would take into 
account information of all forms, . including when appropriate- the full text of 
conversations, books, journal articles, etc. However, the state-of-the-art 
of information system design,.expecial ly, but not only, when the system is to 
be automated, .does not permit such a thorough treatment* For now the system . 
must be conceived of as hand I ing. discrete, demarkable. and identifiable elements 
of information and data, especially data. ' 

The topic of definitions of information and/or. data^elements requires 
special attention andwili be. discussed, in Section IVv In- the- meantime, an 
information or data element can be. thought of in operational terms as a 
concept which is particularized. by, a character, sequence of characters, se- 
quence of sequences, etc. , which appear in some particular location in space 
or time, such as a field on a punched card or in a magnetic tape or disc file 
record, a particular pi ace. on a piece.of. paper, the nth item of a formatted 
message, etc. For example, "WEST" appearing in the appropriate- location would 
parti cularize_ the informat ion. element ."Last name of author," wh-ile in a dif- 
ferent location it might particularize the element "East or West longitude*" 

Typicai of the kinds of information elements that might be expected to be 
important to. a. national information system for physics are the following: 



Author 
Co-Author 
Title 
Journal 
Volume Number 
Inclusive Pages 



Descriptor 

Classification Number 
Query 

Query Number 
Logical Connective 
Count of Citations 



End of Message 
Patent Number 
Method 

Classification Symbol 
Target Nucleus 
Bombarding Particle 
Emitted Particle 
Energy Range 



To this list can be added elements needed by those concerned with' the operation 
and financial support of the. system. These might include elements that pertain 
to records of system usage as well as the more-obvious ones of cost, computer 
time, telephone line time, etc. 

However, to identify or standardize such elements in order that they 
might actually be used in an operating system some day is not the only purpose 
of the project here discussed* .. Rather, the original purpose-was to develop 



3 

































a consistent and structured language.which would assist- the- systems 
staff to proceed -with Its task In a. systematic and scientific way* 
considerably. more- sophisticated. structure aad a considerabiy.larger 
elements was - required*. .Typical . of -such-addJtlonal .elemeats*. are-the 



Most -used Journal 
* P referred I nf6rmat ton. .channel < 

Sour ce. of awareness of existence 
of stated Journal article . 
Percent of person U* -time spent In . 
physics research and development 
.Months.'. In current sub-specialty . 



development 
For this a 
number of 
following: 

Journals subscribed to 
•-Journals scaaned.- or skimmed per- 
- sonally 

Attitude toward- preprint central I- 
. zatlon proposal 

Effect on own -work^ of -responses from 
recipients 6f .'own documents 
-Whether or not a -contributed paper 
later appeared as- a- journal article 



It can be- seen that these are far tvr:re complex than the. preceding* list and that 
the relations- between^-these- can be quite intricate* yet the - data -named and Iden- 
tified by> such, elements are essential to. the systematic deslgn of a national-' 
information' system for physics* To avoid major sins of omission- (by ^ not utilizing 
exis.ting- data) or of commission (by conducting research- that- was - not needed), the 
systems designers need a comprehensive. and Internal ly^ consistent* means of Iden- 
• tifylng>t' organizing, namlngiand. classifying all: such.data^? found* inof I les and 
doccimentsr whether at AlP or elsewhere. and whether generated^by their own-efforts 
or those.^Of others* The paramount purpose of the development- of an Information . 
element structure was to fulfill rror. at- least Indicate how-: It was posslbld to 
fulfill— that need* • 








4 



o 









111. Background 



The recognition of the need to identify and standardize elements of 
information and data that occur within and are handled by a system or group 
of systems can be traced to the experience of the Army and Navy in World War 
II* The fantastic growth in intrinsic complexity and in the numbers of items 
carried in both material and personnel logistics systems compelled the devel- 
opment of new operational logistics, inventory, and communications techniques. 

In addition the sheer cost of military operations made it impossible to tol- 
erate situations where a single physical item might actually appear as more 
than a hundred different inventory items, under different stock numbers, and 
requiring separate*procurement, accounting, stocking, requisitioning, etc. 

Even more intolerable were situations in which military weapons and equipment 
such as anti-aircraft guns might be out of action simply because it was not 
realized that a missing part was available at hand under a different stock 
number than the one shown in the repair manual. 

Another factor favoring standardization was the realization that the 
communication protocols and circuit disciplines developed by different orga- 
nizations were simply not compatible. The increasingly global and inter-ser- 
vice nature of military operations caused these difficulties In Inter-organi- 
zation communication to compound the difficulties in logistics. Both were In 
turn further compounded by the attempt to solve them by applying computer tech- 
nology. Where before hu(nan data processors might be able to solve a difficul- 
ty— or might at least recognize that there was a difficulty— the electronic 
data processors could only cope with those dHTicultles that had been foreseen 
and programmed for. 

Ail. these factors got worse In the post-war period. By the early 1950 *s 
ail of the military services, and, in addition, other government agencies such 
t as the Census Birreau, were putting computers to work to process the masses of 

management, personnel and logistics data which they had to handle. In general, 
however, all of these systems, including the large military systems, were de- 
signed and developed only in the context of their individual needs. Therefore, 
the selection, definition, and detailed format specification of units of infor- 
mation which could be handled (input, processed, output) were done independently. 
It is therefore difficult or impossible for such systems to communicate- with one 
another as Is increasingly required by considerations of economy, efficiency, 
flexibility and, sometimes, even survival. 

Recognition of the need for standardization of elements of information 
has, since then, spread rapidly. Element standardization programs which orig- 
inated within a DOD department were picked up, or in some cases overridden, at 
the Department level. The Department-level programs ‘were In turn overridden in 
1964 by a program at the DOD level. And this has since found itself subordi- 
nated to a government-wide program directed by the Bureau of the Budget. 

I ' 

The tool thus developed has more recently spread outside of Government. 
While the. concept undoubtedly occurred independently In some companies, there 
was certainly some direct transference to industry fromjts development In the 













military. For one» the Sutherland Co.» Peoria, 111., extended the techniques 
and methods used in this area in parts of the U.S. Air Force and, as a manage- 
ment consulting firm, introduced them in many client companies. 

In 1966 a group was organized in the American Standards Association (now 
the USASI (U.S. A. Standards Institute) at the Instigation of commercial , rather 
than Federal agencies, to identify and standardize elements of information re- 
quired in the interchange of information in industry, business and Government. 
This group is now Subcommittee X3.8, "Data Eluents and Their Coded Represen- 
tation." More recently, in response to the rapidly increasing rate of appli- 
cation of computers to scientific and technical information and to library prob- 
lems, another group within the USASI (Z 39 SC 2) has addressed itself to the 
identification and standardization of the elements of information occurring in 
the preparatioru description, and interchange of bibliographic information such 
as citations. 

Many other organizations have more recently expressed an interest in the 
data element standardization problem. These include COSAT) and SATCOM. In par- 
ticular, Task Group for Interchange of Scientific and Technical Information 
in Machine Language (ISTIM) recently established by the Office of Science Tech- 
nology, recognized this as a basic and important problem. 



5 



6 












^iUffl(®R»eaw0!s® 






hmvi ww^miKPnwvHJwivtn w u ''jj^ uwvti« 






tv. Definitions 



The problem of arriving at a standard definition for an information or 
data element has given extraordinary difficulty. For att practical purposes 
the terms "information element" and "data element" have been used to refer to 
the same things. The definition that would appear to have the greatest con- 
sensus of agreement at the moment would be the fot towing one by USASt Subcom- 
mittee X3-8: 

" Data Element - A grouping of informational units which has a unique 
meaning based on a natural or assigned relationship and subcategories 
(data items) of distinct units or value." 

It is impossible— or at least unwise— to isolate the definitions given for 
information and/or data elements from certain closely associated concepts. 

In the case of the X3>8 definitions are: 



Data It em- A unit of distinct information or value classified under a 
data element which cannot be logical ty subdivided and retain signifi- 
cance of the data element grouping. 

Data Use Identifier- A name or title given to the use of a data element. 

Data Code- A number, tetter, symbol or any combination thereof used to 
represent a data item. 

Data Group Identifier- A name or title given to the use of a combina- 
tion of tw Or more related data elements. 

* Data Element Reference- A number, tetter, symbol or any combination 
thereof used to represent a data element. 

Data Use Reference- A number, tetter, symbol or any combination there- 
*of used to represent a data use identifier. 

4 

Data Group Reference- A number, tetter, syn^ot or any combination 
thereof used to represent a data group identifier. 



The fottowing examples are used to illustrate the above definitions: 



Data. Element 
Data Item 

Data Use identifier 
Data Code 



Year 

1967 

Model Year 

1967 



Data Group identifier Date of Action 

Data Elements Year Month Day 



0 



ERIC. 



MMilRlffTlTliiU 






Data Group Identifier 
Data Elements 
Data Items ^ 

Data Codes 



Date of Purchase 
Year Month Day 
1967 April 24 
1967’* 04 24 



Probably the definitions which have the most influence at the moment 



7 



are those of the DOD: 



“Data Element- A grouping of informational units which has a unique 
meaning and subcategories (data items) of distinct units or values. 

Data Item- A subunit of descriptive information or values classified 
under a data element. 

Data Chain" A name or title given to the use of a combination of two 
or more logically related standard data elements, use identifiers, or 
other data chains." 

The definition of “data element" given at the highest Federal level, 
the BOB, is identical to that given above X3>8 except for use of the plural 
form of the last work, “values." However, for "data code," BOB gives: 

“A data code is a number, letter, symbol or any combination thereof 
used to represent a data element or a data item." 

Subcommittee X3>5 of USAS I has submitted the following definitions: 

“Data Element- The name for a class or category of data based on nat- 
ural or assigned relationships that can be used to denote a set of 
data items. 

Data Item- The name for an individual member of set denoted by a 
data element. 

Data Code- A structured set of characters used to represent the data 
items of a data element. 

Data- Any representations such as characters or analog quantities to 
which meaning is or might be assigned. 

Information- The meaning assigned to data by known conventions. 

4 

Data Chain- See macroelement. 

Macroelement- An ordered set of two or more eluents used as one data 
element with a single data use identifier. 

The following definitions were given by the information element stand- 
ardization program of the MITRE Corporation in 1964: 

Information Element is a definable entity whose values, when determined, 
convey knowledge in an information system. 

Value is the smallest piece of information that may be used to communi- 
cate intelligence." 

All of the above represent consensuses, and therefore compromises. The 
following definitions were proposed by one of the authors (Ltbbey) in 1965 on 
the basis of a study of relevant material in linguistics and logic: 

Information Element- A concept selected, defined and distinctively 
symbolized for efficient communication which collects, designates, gives 
meaning to and is extensional ly defined by a specified set of its in- 
stances. 

Data Element- An information element which deals with data, i.e. with 




8 









that kind of information which is usually thought of as being tabulated 
listed or otherwise formatted. 

Va 1 ue* One of the specified set of instances of the concept designated 
by an information element; also, the symbolization thereof. 

It is still submitted that these represent the picture more accurately, al- 
though expediency may require subscription to one of the previously given de- 
finitions, presumably that given by X3.8. 



9 



\ 










V. Alternatives 






Selection of an approach to the information element standardization 
program In AlP is rendered difficult by that fact that: (I) the information 

element standardization methodology itself is far f rom wel }*developed and 
unda^rstood and (2) AlP's needs for such a tool have not yet been adequately 
explored. It is not even certain just what variables would be most critical 
in this case. Several will be discussed in this section. 

A. Objectives 

For one thing, the approach taken can be expected .to vary with the 
objective. Various dlstinguishably different possible objectives for a 
study or project concerned with information elements and their standardization 
are listed below. Although the objectives listed cannot be thought of as 
related in any simple linear fashion, in general' they are listed in a "least 
to most" order. 

1. Problem knowledge and definition . Any explorationat all of the 
problem will contribute to overall knowledge, problem definition, identifica*: 
tion of parameters, magnitude estimates ,■ etc. Any of these should be useful 
to AlP in its syst^s development efforts. 

2. Information Interface Description Tool. The* identification and 
definition of information elements in a standeW manner is an absolute 
essential for describing information interfaces between automated systems. 

3 . Systems Analysis . By identifying and defining information elements 
throughout a system instead of only at an interface and by paying more attention 
to document description and the delineation of the flow of documents through- 
out the system a useful and sometimes indispensable tool for systems analysis 
can be developed. 

4. Systems Network Analysis Tool. Extension .to addi tionai systems of 
any information element structure developed as a tool for analysis 6f:*a single 
system would be very useful in analyzing a network of Inter-connected systems. 

5 . Basis for an» Operationa I . Information Retrieval System. Any stan- 
dardization systeiirpTtcRed~iFThenirfonnatToirTTemeivE~TeveriieceFsari iy implies 
going through much detailed labor to identify and define elements. The product 
of such detailed labor should be directly applicable to the establishment of 

an operational retrieval system, whether manual, semi -automated or automated, 
since any such standardization system, as presently envisioned, should supply 
at least the fol low! n^ basic elements of an information retrieval system: 
classification system, controlled vocabulary of index terms, relation of 
synonyms, and presumably a capability to relate index terms appearing on 
specific documents. 



o 

ERIC 



10 



^ ..r.- . .. — 




6. Inputs for Developmepts Based. on Linguist I cThreori^s . Further 
^ refinement should enable the. same detailed. labor to produce another by-product 

I rf desired, the kind of data needed for development of modem linguistic 

theories or their actual appl i cat ion . to a national physics information system. 
The principal extension required for this objective would be^coverage (or more 
coverage) of non-substantive types. of terminology and of syntactic relations. 

. . 7* Manual Operational Translation Facility . The- necessary procedures, 
f i les , etc. ,- could be developed.to.establlshalnanual -translation capability 
in, for example, an information. analysis. center to translate. Information .. 

I expressed in the terms and formats of. one. system into. the terms of the stan- 

dardized information element structure and if desired. from. this into the terms 
and formats of other systems. 

8.. Semi-Automatic Operational Translation Facility. . By automating 
the procedures and files needed for the facility described inthe preceding 
paragraph, but retaining humans to perform some of the more' sophisticated 
decisions. and processes, a semi-automatic operational translation facility 
could be developed to link two or more automated informat ion -systems. The 
delays inherent in such a faci I i ty . should be small enough to make it practi- 
cable. 

9. Fully Automated Operational .Translation Facility. In principle it 
should be possible to extend the process mentioned in the last two paragraphs 
and develop a fully automated facility which would receiveinformatrion messages, 
reports, etc., in the terms and. formats. of one system and produce as outputs 
the same information in terms of the standardized system and/or in the terms 
and fonnats of any desired other system. There would.be. 1 i ttledoubt as to . 
the usefulness of such a facility. However, it would.be- very costly and 
extremely difficult to Justify on. economic grounds. 

8. Products and Coverages 

For any one of the objectives listed above there would still be a 
variety of possible products or coverages which could.be. choserr.'- The followr 
Ing.list is intended to be at least indicative of this spectrum of choices. 

Each possibility listed is itself capable of further variation. 

* 

1. Dictionary. The most modest coverage would.be. that required to 
produce. a simple dictionary of the terms to be used in a. national information 
system for physics. This would serve the same function with respect to 
standardizing the physics information system's language. as does the convener: 
tional dictionary with respect to. standardizing a naturaT language . This 
would involve merely the identification and the conventional definition of 
terms used with no attempt at intar-relation or classification. Such an 
effort would, of course, make maximum possible use of.the AlP glossaries and 
of any useable products of AlP class! ficationrand indexing efforts. 

2. Dictionary with Adjuncts . A considerably more useful tool would 
result if, in addet4ei^to a simple dictionary, the terminology structure of 
the physics Information system's language was further elaborated by means of 



o 

ERIC 






II 



such devices as cross-references in the dictionary, a thesaurus, a classifica- 
tion scheme, etc. Hardly any extension of this coverage.or of any of those 
listed after this would be needed to provide for language^ purl ficat ton: con- 

fusing or misused terminology, unnecessary synonyms and ambigoously used terms 
could all be recognized and documented. 

3- A Standardized Information Element Structure. This' would involve 
the establishment of standardized information elements meeting. rigid require- 
ments of uniqueness, exhaustiveness, and explicitness. Combinations of such 
bas ic' elements which were found. to be. needed in the conduct of the systems 
operations would also be established. 

4. Value Recording . In some. cases it might be desi rab^ie to identify 
and to record the various possible values (sets of numeric, alphabetic, alpha- 
numeric, special, etc., characters that can be used to. represent each given 
information element alienable throughout- the system). I n' such' cases explicit 
rules may, if appropriate, be specified. which must be satisfied for a value to 
be accepted as valid. 

5- Concordances . In carrying out any of the foregoing. a concordance 
can be developed— i.e. an "inverted, fi le"—l inking information elements to 
their uses in the system. Such a capability would make- it possible to locate 
all places. where information on a given topic existed. in the system. 

6. Formatted Message Description . This could consist of merely 

i dent i fying- the information elements in formatted messages which were to be 
used in the system (and probably the sequence in which they occur) or, more 
usefully, could go on to specify any additional constraints on information 
element values that might be imposed by .particular message formats. 

7. General Document Description. The foregoing- formatted message 
description could be extended to the more general case. of describing docu- 
ments in general. How far this could usefully be carried is impossible to 
say out of the context of specific requirements and other detailed informa- 
tion. 



8. Information Processing Rules and Algorithms. This. would involve 
the explicit description and standardization of the rules. and algorithms 
according. to which various input information elements are processed, operated 
upon, combined, output, etc. 

9. Other System Information . Actual implementation of any of the 
above. listed alternatives would . involve. amounts of detailed labor varying 
from the considerable to the staggering.. It will almost always be possible 

in the process to generate extremely valuable by-products at- tittle additional 
cost. Examples: place (or frequency) of usage of terms, rate and volume fig- 

ures for information flow, tracing of the paths followed through the system 
by information either In terms of documents or in terms of documents or in terms 
of information elements, identification and description of nodes in the infor- 
mation processing network in terms of the information transformations they effect, 
etc. 




12 



VI. Method of Approach 



Since AlP's program is still in an exploratory stage, it is appropriate 
to choose an exploratory approach to its Information element aspects. A heu- 
ristic, first-approximation information element standardization structure has 
been deveioped. Herein lies the key to an appropriate approach for A IP at this 
time. There are many different probiems in the actual process of developing 
such a structure. Experience has shown that it is all too easy to err serious- 
ly by going into one, or a few, of these problems in depth, and at considerable 
expense, only to find later that other problems needed either concurrent or 
prior attention. The best-*perhaps the only— way to avoid such errors is to 
head as quickly as possible for a first-round "trial structure." This will be. 
intentionally inadequate and perhaps wrong. It will, however, have as its 
principal merit that some recognition will have been given— or attempted- -to 
identifiable problems, or aspects, of the task of developing. such a structure. 

A few of the nnore obvious of these problems are: 

1. What will be considered to constitute an "information element" (or 
"data element") for this purpose? 

2. Will there be different kinds of elements? For example, should data 
elements that might later be needed in either operations or produc- 
tion subsystems of an eventual physics informations syst^be dis- 
tinguished from those established primarily for the use of the sys- 
tems development activity? If so, why? And how? 

3. Will ranges or domains for the values (or "data items") be specifi- 
ed? If so, how? 

4. What’files" will be established? 

a. Information element definitions? 

b. Document/message/file descriptions? 

c. Term 1 1st? 

d. Classification schedules? 

e. Etc.? 

5. To what extent can structures already established, such as those of 
X3.8, Z 39, DOD, MITRE, etc., be adopted or adapt^? 

6. To what extent will the initial structure be made to be amenable to 
conversion to machine readability? 

7. What provisions will be made for concatenating or combining two or 
more elements to form an entity that will act as an element in its 
own right at times? (E.g., "Date" composed of the basic elements 
"year," "month," and "day of month.") 

8. At what intellectual/semantic level will variation of concept be 
considered as being different, with respect to what is called out 
as one element of several, from variations of use? 

9. What relations between elements will be noted on work sheets? 



13 



10. Which of these will be carried over onto the formal element de- 
finition entries? 

The foregoing are only representative of the problems which arise, and 
they are not even the most fundamental. Obviously, the "trial structure" is 
intended to be changed, very possibly in toto . Therefore it should be con- 
strained to staying small enough so that human inertia will not become a fac- 
tor. Similarly the amount of effort put into any classification scheme or 
schemes must be limited. And finally, no detailed attention is given to for- 
matting for machine processing. 

An approach with these limitations is described in the remainder of 
this report. 



VII. Implementation 



It is apparent from the above. discussion that data. element standard- 
ization is a semantic approach to controlling the information transfer pro- 
cess. Its purpose is to facilitate the use and exchange of information, par- 
ticularly information contained in the data that appear in machine- readable 
form. The orientation of this study Is based on the assumption that the 
standardization of meaning and the units used to represent meaning is funda- 
mental to proper control of the use and interchange of data. Obviously, such 
data will have to be communicated among the human and machine components 
of the National Physics Information System. 

The approach of this study has been exploratory and heuristic. It wa% 
hoped that two results might be yielded by it. First and foreriiost, a method 
of identifying, developing and using a data element structure would be found. 
Second, the mechanism for an operational system would be proposed. 

The strategy of the. study was first to survey the physics information 
world about which a data element language must speak* The main sectors of this 
world pertinent to vocabulary control were surveyed and- explored. Both the 
documented and operational information resources offered by. promising relevant 
subject areas and the various divisions of the American Institute of Physics 
were tentatively identified. It was among these sectors that significant 
reiterated terms occurred. Only those terms considered eligible for inclusion 
in a controlled standard data element vocabulary were -chosen for display in an 
experimental prototype data element file. 

The next phase went on to structure the terms. Structuring the data 
elements was the first step, after identification and selection, in organizing 
the various terms. The data element structure assumes that the data element 
is the name or generic designation for certain items. Thus, the data element 
"Type of Equipment" might have data items such as "Keypunch," "Verifier" and, 
perhaps even more specifically, "IBM .1^1 Computer." The items are named and 
hence may be organized and recalled by this name. 

In addition, when various names can be interrelated within the struc- 
ture, by being grouped or linked Jtogether, we have a design for a language 
that can represent the information world at h^nd. Classification and a num- 
ber of other organizational techniques were applied to the data elements, in- 
cluding definitions whenever needed. Attention was given only to the design 
of the semantic system. Significant coding and formatting problems were left 
to future stages of standardization of the operational system. These include 
questions of standard character types-, allowable character set extensions, 
control characters, modes of representation, message and field formatting and 
sizes (although free fields are strongly wanted), standard media, and common 
codes for data elements and data items. Nevertheless constant endeavor was 
made to keep the design to a level of simplicity which could, without loss 
of descriptive or discriminative power, effect the efficient handling, trans- 
fer and exchange of blocks of information between man and machine, to some 
extent between man and man and, at a later stage using codes, between machine 
and machine. 



15 



At least three distinct sources of data elements emerged. The first 
sector coincides with the traditional published physics literature. It con- 
tains principally the deta I led. biographical units used in referring to pub- 
lished documents (such as author, title, Journal name, classification and 
Indexing terms ^ citations, etc.). Precise and unambiguous designation of 
these units can assume special importance In the planned -computer-based 
photocomposition of the primary journals In physics, as well as the subse- 
quent development of a bibliographic .data base and further byproducts from 
that base. The second sector comprises general management. f1 les and their 
contents as well as special cot lections relating to physicists — both Indi- 
vidually and cottectivety — and to events in physics. Data element con- 
trol of such files could help provide day-to-day and tine supervisory rec- 
ords and their functional interpretation for the organizational and evalu- 
ative needs of decision makers. Control of the information contained In the 
special cot lections and archives would, if deemed feasible, enable improved 
services and products to be developed (e.g. from AlP resources such as his- 
torical biographical and special bibliographical repositories, institutional 
records, etc.). The third and final sector covers the system analysis, design 
and development activities, where a distinctive iniormatlon metalanguage is 
used as a vocabulary to talk about Information. 



VIII. The Data Element Structure 



A. Structuring of Terms 

The brief discussion of definitions given in Section IV indicates cer~ 
tain shortconJngs in currently accepted conceptual definitions of data elements 
and related concepts. Operational definitions^ particularly those of the type 
used by USASI X3*8 in its Technical Guidelines, may be somewhat less objection- 
able. Such definitions should. take into consideration the components of a data 
element structure, their functions and interrelations. A deeper examination 
of the interrelations between the components of a data structure (i.e. data 
elements, data items, data use identifiers, data groupings, etc.) necessarily 
involves consideration of data processing theory, particularly the theory 
underlying the practices of computer programming, information retrieval and 
construction of artificial languages. Although the. method of this study, 
starting with empirical exploration and heuristic problem-solving approach, 
was carefully chosen so as to preclude excessive involvement in the intrica- 
cies of these vast subject areas, the need to understand the problems clearly 
made it helpful to turn to theory occasionally. 

The prodigious growth of computer technology in recent years has been 
accompanied by interest in the theory underlying computer and. computer program- 
ming processes. Searches for theoretical foundations of information processing 
are legend. A brief description of a number of outstanding developments in 
this field with regard to computer applications may be found in the article by 
William McGee.' * One of these studies may be cited here. 

Quite relevant to the concept of data elements is the work conducted by 
the Share Committee on Theory of Information Handl ing . (TIM) . A definitive 
report of the TIM Committee in 1959 established certain basic data processing 
concepts that are both extremely influential and highly relevant to the process 
of structuring terms. Among the fundamental concepts were entt ty (an object, 
person, or idea capable of being described for data processing purposes); 
property (characteristics in terms of which entities are described); and 
measure (value assigned to properties). A datum was defined as the smallest 
unit of information, consisting of the triple Dij where £ is a measure, is 
the index of an entity and is the index of a property. The unit record 
was considered on the basis of this structure as a one-dimensional array of 
datum triples, in which the index i Is fixed and the index j ranges over all 
properties being represented; the file was conceived as a two-dimensional ar- 
ray, with index j varying over all properties and index i varying over all 
entities. Essentially, this represents a specialized case of the general 
classification Rij, where R is any relation, and i and j are any tabular 
indices. Later work of the Committee developed the concept -of a generalized 
array whose elements have the general form of an ordered pair (v,x) in which 
V is a data value corresponding to an argument x. Arguments are expressed as 
a set of ordered pairs 

[ Pj^» (^2* p2^ I *** ^ • 

where P. indicates the name of the array dimension and v^a value of the cor- 
responding dimension. A two-dimension generalized array in which one 



dimension is property and the second is enti ty is equivalent. to the original 
two-dimensional array. Here, the data value v in an element. corresponds to 
the datum DiJ. 

( 2 ) 

The concept of data element, originally advanced. by O.Y. Evans, is 
equivalent to the notion of property advanced here. "Property" is also the same 
as the USAS I X3.8 concept of data element and coincides with the X3.8 conten- 
tion that the data element names the kind of data or data items (entities) 
which make up its class. However, each standard data element represents one 
unique and defined property which ran^ '‘S over a set of entities (data items). 

The entities may, in addition, have certain properties of their own which 
are either independent of or subordinated to the data element property. The 
data element "Corporate Author" has, for example, the property (explicitly 
stated by its definition) of being an organizational source responsible for 
the writing and generation of a publication or document. 

The data items or entities which are subordinated. to this element In the 
hierarchical structuring may be; "Name of Organization.- Largest Unit"; "Name 
of Organization - Smallest Unit"; "Locatiorf'; etc. There may be in turn a 
superordinate data grouping which includes a number of data elements. The 
term "Author Entries" is, for instance, the data grouping. which includes the 
elements "Personal Author Entry," and "Corporate Author Entry." It also 
includes the information null class of "no author given^'! 

Proper structuring of data elements consequently supplies a hierarchical 
arrangement of concepts In which the basic one- dime ns Iona I value(s) v, or 
the original TIH datum or data Dij, are subsets of a set (called data element) 
which may Itself be a member of a family of subsets (data grouping). The data 
element vocabulary or data structure is therefore the set of all subsets. 

Structuring presumes a law of types which prevents the properties and 
data entities from being confused with the classes of which they are members. 

The data can thus be named without mistaking the name of the kind of data with 
the naine of each example of the data, or with the individuals represented by 
the data. This structuring enables different - higher and lower - levels of 
meaning to be designated. Once designated and fixed, once ranked according 
to appointed levels of generality and specificity, the data couched in 
natural language can readily be manipulated in machine- readable form as well 
as translated, if need be, and interchanged from system to system. 

8. Term Levels 

The discussion of structuring above trfed to demonstrate that the 
terms used in the data element vocabulary may be considered generics and 
ranked In fixed positions relative to one another in a hierarchical scale. 
Ui^iformity and consistency are achieved by standard ization^ An example is 
given in Section IX. C. below, showing how this assignment of relative ranking 
end fixing is performed on two data elements "Symbols" .and "CODEN," The data 
element "Symbols," the generic, denotes or contains the data item "Codes (Gene- 
ral)." On the other hand, a whole code structure is named by the data element 
"CODEN," the name of a specific kind of code. Its data Items in turn are 
the peculiar characters that indicate the titles of particular journals. 



Terms appear at various levels to which they are assigned by convention 
or by the standardizing consensus principle. But a word of caution must be 
said lest standardization unwittingly restrict our handling of these semantic 
tools. Data processing and transfer must continuously respond to the needs of 
our vocabulary, not vice versa. The possibility always exists that because of 
some fixed structured approach, perhaps such as tree structuring, certain 
meaningful groupings of terms or retrieval strategies will remain unused. It 
is probably quite simple to avoid such rigidity by being aware of these pit- 
falls. For instance, the case may be Imagined where we have a conceptual 
cluster structured in the following tree pattern: 

Kinds of conceptual expression 

Signals ' 

Gestures 

Semaphore 

Symbols 

Sound 

Speech 

Alphanumeric Characters 
Codes (General) 

CODEN 

Punctuation 

Standardization of "Codes" under "Alphanumeric Characters" exclusively may 
reduce our ability to retrieve 'Speech pattern reco'gni ti*ori' as , for example, a 
code used to identify individuals. 

One method used to differentiate between the data item, data element, 
and data grouping levels is to identify the level of each type of data col- 
lected into arrays (defined as n-dimensional collections of elements, all of 
which have hierarchically identical attributes) or other structures. Identifi 
cation of level by means of a simple numbering device, the gse of a two-digit 
numeric sequence, was tried out in the experimental data element file. 

It' is thus possible to splay the hierarchical ranking of the compo- 
nents of a data element structure by simply counting the hierarchical level. 
The niost general term is numbered "level 

and the next "level T* and so on. We couTd have, for example, the data 
grouping "Author" (leve*! 1), the data grouping "Member. of the American 
Physical Society" (level T), composed of data elements "Forename" (level 2), 
"Middle Name or Initial" Tievel ^) * "Surname" (level 2), which denote the 
data items "John Q. Smith (level ]), "Mary J. Jones" Tievel !)• Another 
possibility is to have a subordin*ate data grouping under "AuThor" (level J_) , 
say, "Corporate Author" (level ]), "Individual Author" (level 1). Then the 
data element "Surname" is (IfyeT 3,), the data item "John Q. SmTth" is 
treated as (level ^) , etc.^*^' 




19 



c. 



Classification 



In addition to the natural or assigned structuring of a .data element 
whereby certain attributes are generlcally arranged or attributed to its con- 
stituent data Items, the device of classification was assayed. 

As we have seen above, the structuring of a data element Implies a 
classification. For it requires a two-dimensional matrix, where an entry 
in cell Rij indicates that the relation R holds between the entity, proper- 
ty or term i and that of J/ But the data element also requires the process of 
standardization which tags the particular data element as a generic. It must 
by rule remain in a generic relation to Its subordinate constituents or list 
items. That is to say, the data element is by fiat confined to a tree structure. 

But the classification is not necessarily subject to this same restric- 
tion. Hierarchical structuring or levels are not mandatory. 

To enrich the data element vocabulary, and to explore different 
organizational syntaxes, various classification schemes were developed and 
applied to the collection. 

The first classification was radically pragmatic. It was simply a 
breakdown of the subject field as required by an overview of all the data 
elements on hand: 

I. Subject Field 

SYSTEMS ANALYSIS AND DESIGN CATEGORIES: 

A. Bibliographies and bibliographic components (including document 
description) 

B. Authorship, generation, production 

C. Collection, formal literature, storage (incl. all forms of com- 
puter storage and file structures) 

D. Intellectual organization (incl. content analysis: indexing, 
classification, abstracting, annotation, reviews; excludes manage- 
ment functions, systems analysis and design) 

E. System analysis, design, planning 

F. Equipment (hardware and software) 

G. Processing 

H. Retrieval 

I. Dissemination, communication, publication, products (primary, secon- 
dary and tertiary) 

J. Transfer, translation, conversion 

K. Services, system structures, installations, centers, organizations, 
mi ssions 

L. Use, users, ends, needs, plans, channels 
MANAGEMENT, ADMINISTRATION AND AlP FILE CATEGORIES: 

M. Management, control and processing command 

N. Personnel 

O. File 

P. Biography 

Q. Historical event 



R. Manpower, education 

S. Subscription fulfil Ijnent 

T. Doctoral program 

U. Legal requirements 

The main merit of these schemes was that they worked . better on the 
data elements at hand than the others* But the schedule was cumbersome, in- 
elegant, arbitrarily designed, and* neither systematic nor transparently exclu- 
sive* However, simply abbreviated versions of the subject field division 
failed to be sufficiently discriminatory when applied to indexing and retrie- 
val functions* 

One of the classification and. coding schemeswhich mbs suggested, and 
is now used on the data element description sheet shown in Appendix II, is 
as follows: 

Simpt if led. Class! f icat ion 

Field 1 Field 2 Field 3 

1 . Sys terns 

1 * Components 

1 . Hardware 

2. Software 

2. . Aspects 

1 . Subject 

2. Physicists 

3* Users 

4. Institutions 



2. Fi les 

Alternative classifications were developed or borrowed* The traditional 
classification systems, such as Colon Classification, Universal Decimal Classi- 
fication, or even the Perry and Kent semantic code were found to be too gene- 
ra) or non-explicit for the designation of the data elements in our three 
sectors . 

Methods describing elementary items according to the following charac- 
teristics were considered: 

Class (alphabetic, numeric, alphanumeric) 

Size (number of characters or digits) 

Sign (signed or unsigned) 

Justification (left or right) 

Synchronization (correspondence between item value and computer words) 
Punctuation (position of. editing symbols, etc.) 

The classification scheme worked out at the Mitre Corporation for data element 
regulation of a coimnand and control system was taken into account: 



» - - r w JIty II* »' . W * ■^■WiWgffPffiWTWfWfP ^ 'l t .WJU PTFP ■» u t A l |I J I > II , v\ M MW j ^ I HULU J ^ W) H Ui J - WA f ^WfgJvJSL^yi w^'- 

■l 

f 

\ 

\ 



Presentation Form 




o 

'ERIC 



R “ Representation 

N - Name 

V - Value 

E * Either ( a name ” alphabetic 

or a value numeric) 



Document Description 

I - Implied 
P - Punctuation 

Attributes 

L - Location 
) - Identification 
Q - Quantity 
C - Condition 
0 - Not Applicable 

Basic Group 

T - Time 
S - Space 
A - Mission 
W - Weapon 
L - Plan 
R “ Provisions 
E - Equipment 
F “ Faci lity 



P - Personnel 
M “ Envi ronment 
V - Event 

N - Natural Feature 
D “ Document 
0 ** Organization 
X - More Than One 
0 - Not Applicable 



The following simple and ctearcut classification was applied 
together with the more complicated schemes: 

Systems Fi les 

1 . Componen ts 

Hardware 

Software 

2. ‘ Aspects 

Subject 

Physicists 

Users 

Institutions 

Adaptations of the Mitre scheme were made for functional purposes 
(ignoring the non-exctusive and mixed type character of the classes): 

II. TYPE OF FORM OR CONTENT . . 

A. Document, message or component 

B. Sign, symbol or their conventional representation, including 
punctuation 

C. Code * 

D. Documentation file 



22 



E. Event, process, product or doer 

F. Fact (concept) 

G. System or system component 

H . Value 

0 Not appi icable 
III. ATTRIBUTE FACETS 

N. Name or identification 

S. Space, geography, location 
(^. (Quantity 

L. Q.ua1ity 

C. Condition, status, development, state of affairs, state of existence 

T. Time (duration or point of time) 

E. Evaluation, analysis, conceptual synthesis 

0 Not appi icable 

p • 

A somewhat less objectionable, although Jess useful version of the 
Mitre approach was advanced; 

PRESENTATION FORM 

R “ Representation of 

1) a name of a series of digits 

2) conventional symbols and 
s i gns 

G - Graphic 
N,“ Name 
V - Value 
M - Mixed 
C - Code 

A - Abbreviations, acronyms 
P - Punctuation 
I - Instruction symbol 

For the less than three hundred data elements in the experimental file, the 
facets for type of form or content and the attribute facets seemed super- 
fluous. Naturally, rt was recognized that eventual expansion of the number 
of elements in the data element control device could make a classification 
most useful and possibly even rely on the descriptive value. of these facets 
for retrieval purposes. 

CLASSIFICATION TAG 

* 

One device to aid in the identification and recognition of the data 
element was the use'of a classification tag. The tag formed part of the 
code attached ‘to each data element. The dat^ element “Number of Man Hours," 
for example, was given the tag EEO to signify; Position I; E ^ System 
analysis, design, planning; Position )); Event, process, product or 

doer; Position III: 0 = Quantity. A mnemonic abbreviation was employed as an 

alphabetic code, yielding NOMHR for the"^ example. The combination of tag and 
mnemonic produced a unique code for “Number of Man Hours" - EEQ.N0MHR‘. 



ATTRIBUTE 

N - Number 
P - Percent 
T - Type, *kind 
L - Location 
(i - Q,uantl ty 

•C - Condition, Incl. quality 
I -* Identification 
0 - Not applicable 



DISCUSSION 



The classification schemes that, we re developed, and, tried. out (by com* 
paring the existing data element, file with possible new -entries) . proved inter- 
esting and successful as far as discrimination and sorting were . requ I red. But 
one faced the dilemma that each scheme entertained was either rationally faulty 
and useful or elegant and worthless. 

An objection in principle to an extraneous classification scheme might 
be made in behalf of the data elements themselves. Each data elements is the 
unique name of a quality or relation. Each utilizes this uniqueness to des- 
cribe and order the specific set of data items which it denotes. Further 
capitalization on classifying or ordering schemes mighty even if they do 
not totally confuse the question of exclusive description, prove unnecessari 1y 
redundant. 

So much for the descriptive capacity of classification. Other horizons 
are still open, however, and fiiay merit further exploration than was possible 
in this study. The use of classification to interrelate and organize data 
elements into syntactic patterns useful for retrieval purposes might be Inves- 
tigated. The tree structuring required by the data elements bound to their 
data items, when combined and matched with non-hierarch leal strings, may 
yield interesting correlations between the data Items In different sets. Such 
combinations could prove Interesting In analysis of management files. 



I 



IX. THE THREE AlP SECTORS 

Let us now turn to a discussion of the three sectors ■ identified in 
Section VI . 

A. Bibliographic Data Elements 

Following the analysis of. physics Journals made-by Inforonlcs, Inct, 
in its attempt to derive unambiguous and retrievable- blblloorapnic items, 
a list of "Information items". (data elements) was compiled:^ ^ 

Journal title 
Volume 
Number 
Date 

issue Title 
Article Title 
Abbreviated Article Title 
Author(s) 

Forename 

Middle Name or Initial 
Surname 

Author Aff i 1 iation(s) 

Place of Presentation of Paper 
Date of Manuscript Submls«-lon 
Page Number 
Abstract 
Body of Text 
Heading 
Sub-heading 
Sub- sub-heading 
Figure 

Figure Caption 
Table 

Table Caption 
Table Footnote 
Equation 
Footnote 

Author (present author address) 

Title (sponsor) 

Text 
Cl tat ion 

Non-alphabetic Symbols and symbol Sequences 
Text Structure Data 
Section 
Paragraph 
Sentence 
Word 

Character 



Additional material not related to the primary journal product (copyright 
statement, information for contributors, and indexes) were omitted from the 
original Inforonics item identification list. 



25 



o 



Certain identification requirements created by secondary use of data 
are noted in the Inforonics report and. di fferentiated according tb the type of 
use. The first type involves the creation of bibi iographic reference tools, 
such as author, title, and subject indexes. The second type is required for 
text extraction, such as abstract journal entries, announcements of publica* 
tion or compressions required for review. Analysis was made of the contents, 
forms, formats and procedures required for the different types of indexes 
and text presentation. The present study accepted with slight modification 
the basic list of data elements suggested by the report. However, further 
analysis of individual elements resulted in somewhat different structuring. 

Rather than data element "Author," for example, the experimental file con- 
tained data group identifier "Personal Author" which was a composite of 
data elements "Forename," *H{dd1e Name," and "Surname." The data use identi- 
fier for "Personal Author" may be, e.g. for the Bibliography units ^^Jour- 
nal Article," "Textbook," "Paper in Conference Proceedings,** etc. A later 
listing of the data elements required for journal article presentation which 
was developed by Inforonics, Inc. and Vance Weaver Composition, Inc. was 
adopted in the experimental file and is reproduced in Appendix I. 

Numerous groupings of potential data elements presented themselves as 
this area was further explored. Over two hundred plausible candidates for 
the experimental file were suggested by the invaluable work by Ann T. Curran 
and Henrietta D. Avram, The identification of Data Elements in Bibliographic 
Records . 

On the other hand, the actual operational file used at AlP during the 
study contained only seven bibliographic data elements. The file was the 
machine- readable store of physics journal literature maintained in the Tech- 
nical Information Project (TIP) store in the MAC system at Massachusetts 
Institute of Technology. The seven data elements made up the 'unit record', 
containing journal, volume, page(s), title, author, affiliations, and 
citations. Later additions to the unit record will include physics subject 
classification number (s), index terms, and possibly abs tract (s). Comparison 
between the TIP data elements and those used, planned or contemplated by 
Physics Abstracts, published by the Institute of Electrical Engineers (I EE) 
in London disclosed several other candidates such as language of original 
publication, paper number (including CODEN), corporate author, PA classifi- 
cation code, etc. In addition, certain highly specialized collections at AlP, 
such as the Bio-Bibliographic Collection of the Center for the History and Philosophy 
of >-Bhys^.c;s maintain such extraordinary data elements related to manuscripts, 
tape recordings, apparatus, that a separate division of eluents seemed 
warranted. . ~ 

All of the candidates were noted. Many that were synonymous or nearly 
synonymous or translatable were noted as such. But certainly not all 
possibilities were incorporated into the file. 

A number of options or <y*iteria for selecting the data elements were 
available. One could use a stockpiling approach and enter all possible, 
usable or unusable, elements into the file. Of one could apply canohs of use. If 
thpi first .approach w^s ..take^, iabout a thousand - mostly unusable - elements 
would be selected for bibliographic purposlas. If the latter was chosen, it would 
be necessary torequire that the elements be actively used at AlP. In such a 




26 






;itrs\ 



case, only seven elements could be allowed. The more practical criterion, 
requiring that the elements beused in interchange between system interfaces, 
a method recommendabie for iater opera tionai impiementation of a data eiement 
structure, was even iess adoptabie. For no active interchange between systems 
was occurring - except for the work of lEE, the detaiis of which were not 
yet avatiabie. 

Seiective adoption of eiements was the course eiected among the aiter* 
natives. O'le of the criteria for seiection was whether, the eiements wouid 
probabiy continue to be used for journai input to a primary data base. Eiements 
from TIP, Physics Abstracts and severai AlP primary and a.few-other secon- 
dary Journai s, (e.g. Nuciear Science Abstracts, Science Citation Index) seemed 
iikeiy to remain significant for primary input as wei i, as bibi iographic pur- 
poses. The actuai keyboarding operation When Journais are being inputted to 
the computer for iater photocompostion wouid undoubtedly require stili finer differen- 
tiations, especiaiiy for certain mixed entries appearing-in footnotes and 
citations. Naturally, the assemblage of bibliographic data eiements in the 
experimental file was considered tentative. The present col lection vrl 11 
unquestionably differ from the later operational file. The full testTng of 
the- data element control structure to determine Its efficacy in facilitating 
data interchange and perhaps even In performing its secondary data retrieval 
function*can be properly made when the system is fully operational. The pres- 
ent study has addressed itself principally to exploring a method of handling 
such a control structure. • 

Listings of sample bibliographic data eiements from the experimental 
file may be seen in Appendix 1. Appendix V shows a comparison of sample data 
eiements used in various systems at AlP. 



8. AlP Files, Records and Resources 

The second sector comprises AlP management *and special services files. 

It forms a crossroads for many of the* Institute's services, services which 
have accumulated considerable amounts of information. Most o^ this information 
is organized, much of It can be ‘represented in a data eiement. vocabulary. 
Consequently, the AlP files in the Subscription FuifiUment Division, the 
resources of the Center for the History and Philosophy of Physics, the 
cumulative author index of AlP Journal contributions the Education and Man- 
power Studies publications, the Directori.es of Physics and Astronomy Facul* 
ties and numerous other sources were examined for possible data eiements of 
interest to the AlP Information Program. 



General iy, two classes of eiements appeared In tj^is survey: those of 

possible interest from an information management perspective, and those which 
dealt with physicists - individually and collectively -and with events in 
the subject .matter and history of physics. 



Each division and service was looked at without preconception or 
design. The snalysis of each area was carried out witltout consideration of 
possible requirements of existing AlP operations* No though.t was given to any 
changes in existing practices. Only the possible data eiement .vocabulary 
which each could yield was considered. In this manner, it was possible to 
characterize the classes of data as they now appeared. Their possible 



I 

i 

I 

j 






utiUty for management decision making or for special tabulations, listings and 
products could be considered on the basis of the elements alone. 

1 . Subscription Fulfillment 

The Subscription Fulfillment Department has been concerting Its 
automated punched card system to a magnetic tape file for operation on a 



Univdc 9300. 


The data elements required 


in the three major records have been 


clearly identified and coded. The three 


major records consist of a master hi: 


tory record 


file, a society record and a 


journal record. 


A few 
instructive: 


samples of the data elements 


; used In these records may be 


Field Number 


Field Name (Code) Content (Data Element) 


1 


CC 


Card Record Code 


2 


TRAVC 


Transaction Code 


3 


ACCT 


Account Number 


k 


RECT 


Record Type 


5 


ZIG 


ZIP or Geographic Code 


6 


FNAME 


First Name 


7 


MNAME 


Middle Initial 


8 


LNAME 


Last Name 


9 


LINE 2 


2nd Line of Address 


10 


TITLE 


Title 


11 


STAC 


State Code 


12 


LINE 3 


3rd Line of Address 


13 


LINE k 


kth Line of Address 




DTSUS 


Date of Suspense 



Field numbers 6,7|8 and possibly 10 contain information that appears as 
bibliographic data elements. 



The Society Record contains certain biographic Information of possible 
interest to historians of science. This\s indicated by the d'ta eluents 
"Date of Election (to membership)," "Date of Promotion to Highest Class," 
"Date of Birth." 1he Journal Record contains an important code structure for 
the primary physics jpurnals published by AlP. Another. field of possible 
use in the dissemination' of information is indicatedcJp the Society Record 
by the data element' "Area of Interest." 

Out of the k\ data elements listed in the experimental file for the' 
subscription fulfillment function, ten elements are of possible significance 
for conventional information purposes. Varieties of management statistics 
may be obtained by means of other data elements. Such statistical reports 
might conceivably cpver various segments of the journal subscribers, class i~ 
fied according to, say, areas of interest or society membership. Circulation 




28 






I 



11 



f 



K 



.qrKrfrptivikj 




and business data needed for publication control as well as day-to-day 
operations are readily identified by the classes of data .elements. It may 
conceivably become possible some day. to perform statistical forecasting 
without excessively complex new programming based on. the data organized' 
by the data elements. One .jight, for example, not only be able to identify 
the individuals and organizations interested in a new class of services, but 
predict the volume of subscription according to past performance in other 
media. 



2. Center for the History and Philosophy of Physics 

One of the nation's significant repositories of Information about 
physicists and events in physics, this Center Is based on a nucleus of 
several manually tended collections. They are comprised of the National 
Catalog of Sources, . the Bio-Bibliographic Collection, the files of the 
Oral History Project (made up of tape. recordings and transcripts of inter- 
views) and the Niels Bohr Library - which houses about 5 >000 volumes 
and a unique historical archive containing manuscripts and documents. 

As might be expected, the high degree of organization of these 
collections was instrumental in supplying many data elements. The cata- 
loging for the Niels Bohr Library reflects standard national practices. 

The manuscript collection, for instance, follows the prescribed data sheet for 
the National Union Catalog .of Manuscript Collections recommended by the Manur 
scripts Section of the Library of Congress. 

Bata elements, data groupings or data use Identifiers suggested by 
this cataloging process include, e.g..'!Name of Repository, ""Principal Name 
around Which the Collection Is Formed," "Occupation of Principal Person, 

Family or Corporate Body," "Form of Manuscript Reproduction" with data 
items: "Handwritten Transcripts, "'Typewrit ten Transcripts," "Posit! ve Photo- 

copy," "Negative Photocopy," "Positive Microfilm," "Negative Microfilm," 

"Number of Microfilm Reels,'!. etc* The data element "Types of Papers" have data 
items: "Correspondence," "letters," "diaries," "documents," etc. 

The National Catalog of Sources for the History of Physics maintains 
a card catalog from which information may be retrieved concerning historical 
source materials such as manuscripts, diaries, experimental notebooks ^ inter- 
views, correspondence and apparatus.. Data elements or -Data use Identifiers 
"Name of Physicist," "Name of Organization," and general subject terms, e.g. 
"Nuclear Physics," "Accelerators," "Administration of Science," etc are used 
here. • 

The Oral History Col lection on Twentieth Century Physics maintains 
records that contain the following-data elements anddatause identifiers: 

"Tape Number(s)," "Name of tnterviewe^e}' *!0ate of Interview !(or speech)," 

'.'Number of Hours of Interview," .’"Date Sent to Transcriber," "Transcriber's 
Name," "Date transcript was Received from Transcf iher^" "Number of Pages of 
Transcript," data element "Corrections" with data use identifier, "Completed 
Date of Corrections Made on Transcript Against Tape," etc. 

Many of the data elements derived from other divisions coincided with 
common bibliographic elements or groupings: such as '!Naroe of Organization," 

"Surname," "Address," etc. Some data elements, such as '!Press Release" and 












5 



1 













29 



"Visitor's Evaluation of the Institution" from the Visiting Scientist 
Program require a full text as their data item. Others, particularly from 
Education and Manpower Studies, presented contexts which required considerable 
analysis to derive a standardized data structure. For example, the concept 
and data expressed in a table entitled "Size of secondary school attended 
by doctorate'hoiders" was treated as follows. Data element; "Size of Secondary 
School," its data items: "School Enrollment Size 1-19;" "School Enrollment 

Size 20-59," etc. through "School Enrollment Size 500 or more." Data Element: 
"Percentage of Physics Doctorate-Holders," data it^s: the values "5.0," 

"31.2," etc. for both, the data use identifiers: dates "1960-1," "1961-2," 

etc. For comparison, there were the data elements "Percentage of Chemistry 
Doctorate-Holders," "Total Doctorate Holders," etc. The importance of such 
elements lies in their potential use for storage, trianipulatlon and perhaps 
retrieval, probably in the handling and control of questionnaires of similar 
multiple forms. 

Finally, study of the Institute's needs for a management information 
system and its feasibility might be considered at some future time. A 
centralized data element file will probably help to implement such an under- 
taking. Naturally^ before designing a data element file for such a purpose, 
the desirability and need as well as the details and magnitude of AlP's 
requirements would have to be specified by prior study. 

C. System Analysis Vocabulary 

The language needed to perform system analysis, design and development 
activities in an information program is different from the vocabularies of 
the first two sectors In several ways. The subject matter is necessarily 
different; the lists or data Items named by the element generally resemble 
inventories; and the fuiiction of the entire structure seems more to serve 
internal "housekeeping" purposes than act as an interchange language 
between systems. 

The following five types of terms have been adopted as system analysis 
data elements: 1) Names for manifolds such as files, catalogs and inventories; 

2) Relatively self-contained concepts or systematic overviews that have famil- 
iar, conventional or readily assignable sub-divisions, such as an integral 
classification scheme, for example, tlie new hierarchical for physics; 3) Names 
of key activities, functions, processes or operations in an information handling 
establishment, which can be broken down into a finite number of steps (data items), 
e.g. "Retrieval," items: subject analysis, question formulation, encoding, search 

of system, etc.; 4) Criteria or evaluations . that require a checklist for their 
data items, e.g. the hardware capability of a "Processor": data it^s - number in 

line, speeds, parity check - 1/0 fai lure. checking, computer circuity, etc.; and 
5) A category of mlscellania somewhat less institutionalized that l) that 
describes sundry items which belong to a reaTworldof less than coherent objects, 
bric-a-brac, rules, commands. Codes would be named by such a category, the code 
units would form its data items. 

To illustrate the first type we can cite the data element "Name of 
SystCTi" which appears as a member of. the data grouping "General Description 
of a Sysieirf' which appears as a member of the data grouping "General Description 
of a System." Some data items for this element are "Natitwial Physics Information 









nv^f^ 



SMwm WM^iWJwasriswwflfp^^ i^|j,i§ul wjyMJj«Aflc?^j^ 









System," "Clearinghouse for Federal Scientific and Technical Information," 
"Science Information Exchange," etc. The data element "AlP Information Hies" 
has data it^s: "Information Division Library Card Catalog," "TIP Store," 

"Education and Manpower Department Ftecords."- 

Data elements illustrative of the second type of system analysis terms 
are "Universal Decimal Classification," "Colon Classification," "Newton's 
Laws of Motion." The contents of each classification system have been con- 
strued as the data items, as have the descriptions and series of equations for 
each of the three laws of motion formulated by Newton in 1687. The data 
grouping for the first two elements is "Classification Schemes'.' or 'Classifi- 
cation Schedules," depending upon what is meant. 

The third type names activities or processes, such as "PERT Analysis," 
or "Benefits from Data Element Standard! zatloh." The. latter might have the 
data items: I) "Facilitation of interchange and compatibility of data 

among different data processing systems," 2) "Reduction in total number of 
data elements and codes," 3) "Reduction in processing .costs by using stan- 
dard codes ins^ad of full descriptors," I! Facilitation of tbe..^eyeipFimertt 
of Jstaihdatdr.iinformation and data systems by standardizing the elements and 
codes," 5) "Facilitation of syst^s inte^fration and .direct c<»nputer-to-conr 
puter information transfer." 

The fourth type, suppi ies those criteria or canons for evaluation that 
require lists which every 'expert' should always have on the tip of his 
tongue or «t least at his fingertips, e.g* data element '.'Capabilities of a 
Software Master Control System." Data items: I) "Static or dynamic storage 

allocation," 2) "Controis," .3) "Interrupt Hand I ing," 4) "Task Scheduling," 

5) "Multiple Processor Capabilities," 6) "System/Operator Interface," 

7) "Debugging Features," 8) "Accounting Capability," 9) "Programming and 
Data Protection,'' 10) "Time-Sharing (Conversatronal) ," .11) "Foreground/ 
Background Processors," 12) "Processors under Control Cf Master Control Sys- 
tem," 13) "Device Independent." 

The fifth type contains* a category of somewhat arbitrari ly. chosen terras. 
Many bf the bibliographic term§ could be data elements in this sense. For 
instance, take the composite term or data .use identifier "Unit Records for 
Physics Information" used by Physics Abstracts. Its component data elements 
are, inter alia "Paper Number," "Chapter Code," "Author's Affiliation," etc. 

The system data element "Symbols" might have data .items "Characters," 
"Signs," "Numerics," 'iPunctuation," or even "Codes." However, "Codes (General )" 
will require a special data use identifier. For, due to the standardization 
process, the name of a particular code, say, "CODEN" for serial title abbre- 
viations, may stand for a data element, the data items of which are the speci- 
fic codes within It that designate each periodTcal title.. The latter course for 
structuring CODEN was followed in the experimental file. 

This fifth type occurs more typically in such data elemen.ts as "Number 
of Respondents (to a questlonpai re) ," 'iType of WIC [Written Infortnal Communi- 
cation] ." 

Data elements of the type used In systems analysis may be used to index 
tables. For example, the headings "Media" and "Percent Selecting" can be 



31 





















"TTF ^ - 



regarded as data elements in the. following table. The data items of "Media" 
are then "Journal and Abstract Indexes," "Regular Scanning of Literature," etc. 
The data items of "Percent Selecting" are "21," etc* When the data elements 
are matched in the same array and the ir\ data items are correlated In a specific 
matrix, the full table Is reproduced:^ * 



Media 



Percent Selecting 



Journal and abstract indexes 
Regular scanning of literature 
etc. 



21 

^5 

etc. 



The syst^ data element vocabulary stock could turn out to be extremely 
large even when the terms are selected with restraint* Before incorporating 
this vocabulary, cons! derat ion must be given to the resources and requirements 
of the system* Appendix I presents a few additional examples of the data 
elements from this sector, accommodated by the expertmentaf file* 




32 



u/M P WJ P Wi- ww ' 












^V>}W 3p™jlff jy,tW l lJi)i4WT * mWiriWtJ P JW Tgwrwwr; — 



X. THE EXPERIMENTAL DATA ELEMENT FILE 
* 

A, Overa 1 1 Features 

An experimental manually operated file, a possible prototype of an 
operational system, was set. up in the AlP Information Division. Its purposes 
paralleled the study. The file structure was exploratory. It was an 
attempt to raise questions and.find solutions at a mlcrocosmic level which 
could be applied to the larger system of which it was a model. 

The function of the file was to accommodate the data .element structure, 
to integrate ail of its relevant parts, especially the “building blocks" of 
the information system, the data elements. The data elements were identified, 
processed, and then recorded by the mechanism of the file. 

The principal recording instrument was a standard Data Element Descrip^ 
tion Sheet. Several versions of the sheet were drafted and employed. Appen- 
dices if and Ml reproduce two of these versions. Additional forms were used 
for cross-references and qualifiers. 

B. Petal led Components 
Data Element Description Sheet 

# * 

It is essential to the standardization process that each data element 
be identified, registered, defined, classified, and coded in a uniform manner. 
To make this possible within the frameytrork of an experimental data element 
file, a principal recording instrument, the standard Data Element Descrip- 
tion Sheet, was drafted. 

Data_Elemeftt.!Name 

Each data element is given a unique and unamt^gqous name. Its meaning 
must be clearly distinct from that of every other data element. Generally, the 
name consists of a noun or noun phrase, ccmimon segments of which may be regard- 
ed as qualifiers of the more unique portion. In our example, "Number of Man 
Hours," "Number" is a qualifier of "Man Hours." In exceptional cases, where 
ambiguity can arise, a term qualifier or scope Jiote (entered in parentheses 
after the name) may be used to distinguish this element from another. 

Quai if iers 

Qualifiers are entered on a separate qualifier sheet, and are alpha- 
betically interfiled with the Data Element Description Sheets. Synonyms of 
qualifiers are noted. 

Data Element References 

Names^r abbreviations differing in form from the data element name 
are entered on the line appropriate for Al? usage or that of an outside orga- 
nization with which the data element is identical. 

Data Element Code 

The unique coded reference to the data element used by AlP and/or other 
organizations is recorded. The mnemonic "NOMHR" is given in the example. 






fi^rV'rr 



T.^ I’*"' ■■ -'■ - 'lTTtf.'i.^r^^ 1 ■^^^rM^-r-r^r,ri 



P^ftn ^ ^ r'»*, V 



Type 

Indication of whether. the entry under data element name is a data 
element (basic) or a data element grouping (composite). Basic. data elements 
are, e.g. "Year," "Month," "Day of Month," The composite or. data grouping 
is "Date." In the latter case, the data elements "Year,'.' etc. are entered on 
the 1 i ne Data Items . 

Synonyms 

Synonyms of the data element are entered on this line and on a 
separate Reference to the Data Element sheet. See Appendix II. 

Data Group Identifier 

The name or designation given to a composite or combination of two 
or more related data elements. 

Data Group Reference 

A number, letter, code, or other symbol used to represent a data group 
identifier. 

definition 

An acceptable and distinct definition of th& data element is entered in 
cases of ambiguity between two data elements that are not resolved by paren- 
thetical term qualifiers or scope notes. 

Data I tCTis 

Each data itenl classified under a data element will have a unique name 
and meaning different from any .other data item classified under that data ele-. 
ment. A data it^ may be given a unique abbreviation or code, .entered under 
code . Each data item classified under a data element must have a homogeneous 
characteristic that allows It to fit within the data element grouping. A data 
item cannot be logically subdivided and retain significance of the data 
element class. 

Data Item Code 

Existing codes should be considered at the implementation stage to 
minimize conversion. The length of the code should be short to conserve 
storage space and transmission time. Mnemonic abbreviations should be con- 
sidered as codes to facilitate human use and understanding. For machine 
to machine interchange numeric codes may be preferable. In any case, the 
code should be designed to provide high reliability in Interchange. The 
code should allow for adding or inserting new members without having to recode 
or expand the code length. Redundancy may be considered, when appropriate . 
Where applicable, the cod^ should, provide for sorting so that. If a sort 
operation is performed on the code the members are ordered in the desired 
sequence. 

Data Use Identifier 

Each data use identifier Is different from any other da^a element or 
related feature. A data use identifier may be given a unique mnemonic 
abbreviation, enter'^d under Data Use Reference. Data use Identifiers apply 



34 



to the data items of the data element from which they are derived. Two 
or more data use identi f iers .can be. chained to each other in a prescribed 
sequence and used as data group. identifiers. For example, two data use 
identifiers called **City of Bi rth"and"State of Birth" could be grouped 
together to form a data group identifier called "Birth Place." 

This arrangement is based on the notion that, when the data items of a 
data element appear in a system, they are used in specific contexts and 
have specific connotations. These uses do not change the class, the data 
items of the basic definition of the data element. Such applications are 
named by data use identifiers. For example, consider the data element, 

"States of the United States." The syst^ may require "State of Birth." In 
tne system design, the terminology "State of Birth" could be used to name a 
file, and would be designated as a standard data use identifier. Subsequently, 
whenever it becomes necessary to use a data use term for "Birth State" or other 
designation with the same meaning, the standard data use identifier "State of 
Birth" should be used. Other examples of data use identifiers for the data 
element, "States 6f the United States" might be "State of Residence," "State 
of Legal Residence." 

Classification 

Two experimental sets of classification terms were tested. The 
question of classification is discussed in Section Vtll. C -Classification. 

Array 

A two-digit code discussed. in Section VIII 8 - Term Levels. 

Filing 

The experimental Data Element File is thus a manually handled col- 
lection of Data Element Description Sheets, References and Qualifier Sheets, 
filed alphabetically. In some cases, additional Data Eluent Description 
Sheets are entered into the file under the names of significant data items 
and data use identifiers. 

C. Cri teria 

The definitions and gui del ines. used to specify and record the data 
element as a class of data, whose members are data items, indicated initial 
criteria which can be followed to identify the data element structure. Once 
the terms have been identified and interrelated according to this structure, 
the next stage is that of standardization. 

Let us refer to the recommended procedure in the USAS I X 3*B - Techni- 
cal Guidelines to demonstrate the procedure followed in accommodating these 
essential terms. 

(1) To assure the broadest scope and application of. the resulting stan- 
dard data element, research existing data systems, publications and forms 
where similar or equivalent data elements are likily to exist. The first 
thing to establish js the tentative list of data Tt^s which determines the 
true meaning of the data element by establishing its homogeneous class char» 
acteristic. As far as possible, lists of all data items involved should be 
assembled or compiled. 




35 



(2) As research proceeds, data elements should be identified, and 
recorded to show: the title each time.it appears; any data element definitions 
which appear; any additional data items, and the publication, form citation, 
or system in which they appear.' The' breadth and depth of research and 
recording must be tailored to the situation. Obviously, any data element 
with limited data items such as "Month" does not require an extreme depth of 
research into systems before.a) 1 . the possibilities are exhausted.. Nor will it 
be necessary to record its use each time. On the other hand,. a more complex 
data element, such as organizational entity, exists in so many different 
forms, in so many different contexts, in so many different permutations and 
combinations that extremely broad and deep research must be undertaken before 
the requirements and the existing solutions to these. requirements are known, 
which is the basis for development of a standard data element, (in other words, 
all data elements are not the same in essential nature and therefocettherriiiles 
cannot be arbitrarily applied. The nature of the words in the language which 
have been required and selected for use in data systems varies widely. Con- 
trast "Month" with twelve data it^ possibilities to organizational entity 
which first requires extended conceptual development to determine exactly what 
it is , how it should be defined, whether it needs to be divided. into several 
data elements, etc. long before the list of data item possibilities can be 
examined in any detail toward standardization.) 

(3) Survey the known and foreseeable data system. requi remen ts 

of the various participants and interested organizations, in terms of the data 
element(s) under consideration, and list them by tentative title. Again the 
depth and breadth of the survey must be tailored to the situation. 

(4) Study the elements of data extracted during the research. Select 
and develop the one which most. nearly meets each anticipated data. requirement. 
Name and define the element and its related features in accordance with the 
criteria set forth above. 

The experimental file proved effective in regulating the usage of class 
terms used to refer to bibliographic data. As more records we re. received, the 
existing data elements and their components In the file needed. less and less 
to be changed. More data. use identifiers were required for the same data 
elements, rather than new elements. Unfortunately, the amount of. data encoun- 
tered during the identification phase was still relatively too. small to allow 
any quantification for testing purposes. The use of a larger data base 
for comparison against the present existing file of around three. hundred terms 
will probably allow measurement and some consideration of cost factors related 
to the efficiency and maintenance of the data element file. 











XI. RECOMMENDATIONS AND CONCLUSIONS 



A standard data element structure to control the use, transfer and in* 
terchange of records, particularly from machine* readable files, was examined 
and found viable. Three major areas of application for the file were dis* 
covered at AlP with differing types of data elements and varying requirements. 
Considerably more work is needed to clarify the formal definitions of the 
basic terms, although analysis has made it possible to understand and work 
with the essential data element structure. Standardization is also required 
at a number of levels. The standard data element file Is highly adaptable 
and corrigible and can adjust to innovative standardization at the national 
level by controlling data conversion as well as identifying common elements 
and codes. 

Operational implementation of the experimental prototype file is 
recommended. It Is also proposed that eventual automatic processing of the 
operational file be considered. 

The manually operated experimental data element file could be auto* 
mated in two sequential stages. The first stage could begin at a semi*auto* 
mated level. The beginning would be marked by studies and decisions rele* 
vant to coding and formatting requirements. The open questions that were 
raised with regards to standardization would have to be practically resolved. 
These include the establishment of common codes for data elements and data 
items, standard character types, allowable character set expansion, message 
and field formats, flags and field size. Questions reiating to media would 
of course be determined by availa|}ie equipment. The.prototype flie could then 
be converted to automatic processing. Human judgment would at least at the 
outset be needed to identify, define, and structure the terms. In addition, 
human comparison of the model file data elements and the elements used by 
6ther systems in machine processing of their data bases woiUd have to be per- 
formed at the initial stage. 

If the method proves successful. in automatically regulating data flow, 
perhaps during the input stage for photocomposition or, the output for biblio- 
graphic retrieval , thought might be given to automation of the data element file 
on a still larger scale. It could at such a more advanced stage serve as a 
machine-controlled authority file on vocabularies. It is imaginable that the 
value of such a file could ultimately outweigh ^the programming problems that 
this approach would entail. 



37 













APPENDIX 1 

SAMPLE LIST OF DATA ELEMENTS IN EXPERIMENTAL FILE 

1. BIBLIOGRAPHIC DATA ELEMENTS AND DATA GROUPINGS 

Personal Author(s) 

Forenames or Initials 
Surname 

Titles or Identifiers 

If Appear in Index Entry 
If Precede Forename in Natural Order 
If Corresponding Author 
If 1st; 2nd, 3rd Author 
Corporate Author 
Title of Article 
Subtitle 

Author's Position 

Author's Affiliation (present) 

Name 

Location 

If 1st, 2nd, 3rd Author 
Author's Affiliation-* Name, Location 
Organization Where Work Was Done 
(If Different from Affiliation) 

Name 

Location 

If 1st, 2nd, 3rd Author 
Manuscript Received (or Submitted) Note 
Date 

Sponsor Note 

Presented of Conference Note 
Miscellaneous Note 
' Abstract 

Text of Article 
Subheading 
Table Caption 
Figure Caption 
Equation 

Equation Number 
Summary or Conclusion 
Numbered Footnotes 

Text Only (No Reference) 

Ref e rence ( s ) I nc 1 uded 
Text 

Cl tatlon(s) 

Book 

Author 

Title 

Edition 

Place Published 



Publisher 

Date 

Pages (or Volumes) 

Comment (Any Data Other Than Elements Listed Above) 
Part of Book 

Author of Part 
Ti tie of Part 
Author 
TLtle 
Edition 

Place Published 

Publisher 

Date 

Pages (or Volumes) 

Commentary (e.g. , "In," "Edited by") 

Journal Article 

Author of Article 
Title of Article 
Periodical Title 
Volume Number 
Beginning Page 
Date 

Commentary (e*g., suppl*) 

Patent 

Author (Inventor) 

Country 
Patent Number 
Date (Year) 

Commentary 
Mi seel laneous 
Author 
Title 

Date (Year) 

Commentary 

References (at End of Article) 

Periodical Title 
Periodical CODEN 

Periodical Abbreviation Non~USASI 

Country of Publication 

Series 

Volume Number 

Issue Number 

Part Number 

Supplement Number 

Season 

Month 

Day 

Year 

Article Sequence Number 
Beginning Page Number 



Ending Page Number 
Subject Terms 
Subject Codes 

Short Title (index Annotation) 

Descriptive phrase Annotation 
UOC Number 

AlP Classification Number 
Type of Bibliographic Form 
(e.g., letter, research note) 

Type of Work 

(e.g., experimental, theoretical) 

Language 

Language of Summaries 
Acceptance Date 
Keyboarder 
Date Keyed 

Input Keyboard Number 
Trans lator(s) 

Forenames or Initials 
Surnames 

Titles or Identifiers 
If Appear In Index 
If Precede Forename 
If Not, Ed., Translator 
Title of Article (Original) 

Subtl tie 

Periodical Title (Original) 

Periodical COOEN (Original) 

Periodical USAS I Abbreviation (Original) 
Periodical Abbreviation, Non USASI (Original) 
Periodical Translated Abbreviation (Original) 
Country of Publication (Original) 

Series (Original) 

Volume Number (Original) 

Issue Number (Original) 

Part Number (Original) 

Supplement Number (Original) 

Season (Original) 

Honth (Original) 

Day (Original) 

Year (Original) 

Beginning Page Number (Original) 

Ending Page Number (Original) 

Submission Oate (Original) 

Acceptance Oate (Original) 

Language (Original) 



1-4 



2. HANAGEHENT DATA ELEMENTS 

Education and Manpower Qual { f ications 
vjnal Data 

Surname 
Middle Name 
Forename 
Address 
Bi rthdate 
Marital Status 
Education 
Graduate 
Undergraduate • 

Secondary School. 

Employment Record 
References 

Title of Theses, Principal Research and- Publications 
Years of Training and Experience 
Preferred and Acceptable Positions 
Industrial Research 
Undergraduate Teaching 
Acad^ic Research 
Minimum Acceptable Salary 
Professional Affiliations 
Date of Availability 
Geographic Limitations 

3. SYSTEMS DATA ELEMENTS '(Including Data Groupings and-Data Use Identifiers) 

SYSTEM DESCRIPTION FORM * 

» 

* 

Name of Organizational Unit 
Address of Organizational Unit 
Formal or Functional Name of System 

Name and Title of Person to. Whom Your System Manager Reports 
System Use of Software and Hardware Devices: 

Uniterm Cards ** 

Peek-a-boo Cards ** 

Edge-Notched Cards ** 

Standard Tabulating Cards 
Microimage Searching Devices ** 

Computers or Other Devices Using Paper Tape or Magnetic Media ** 
Specific Missions or Functions for Which System is Operating Major 
Activi ties 

Research and Development 
Production and Quality Control ** 

Marketing ** 

Design and Planning ** 

Others ** 

Medium of Storage ot Documents That Are Retained 
Full-Size Hard Copy ** * * 




mmmm 



1-5 

Mfcrocards (Opaque) ** 

Roll Mi crof i I ID ** 

Aperture Cards ** 

Microfiche, Sheet or Strip Microfilm ** 

Punched Cards 
Magnetic Tape ** 

M&gnetic Disc 
Magnetic Drum 
Others** 

Devices or Techniques used to Establish Relationships, Contexts* of 
Subject Coiicept Terms at Time of Input or Indexing 

Fixed Order of Subject or Concept Terms or Headings ** 

Role, Cause and/or Effect Indicators or Interfixes ** 

Partitioning of Document Via Links ** 

Indexiiig Classification, or Posting to Show Generic, Specific, 
Coordinate or Collateral Relationships, Including Cross-referencing** 
Functional Group Relationship Indicators. (e*g* chemical element 
indicators) ** 

Logical Connectives ** 

* From National Science Foundation, System Description Form for Nonconven- 
c;: tional Sclent! fie. and Technical Information Systems in Current Use, No* 4, 
Washington, D.C;, 1966, pp. xviii-xxvii. 

*'* Data I terns 

SYSTEMS AND OPERATIONS ANALYSIS 
Innovatiotf Required 

Lack of Knowledge of Operational Requirements 

Humber of Organizational Users 

Number of ADP Centers 

Complexity of Program System Interface 

Response Time Requirements 

Stabi lity of Design 

On-Line Requirements 

Total Object Instructions Delivered 

Percent Delivered Object .Instructions Reused 

Total Nondelivered Object Instructions Produced 

Percent Source Instructions Written In POL«Procedure-Oriented Language 
\ Percent of Total Object Instructions Discarded 
Percent of Total Source Instructions Discarded 
Number of Conditional Branches 
Number of Words in the Data Base 
Number of Classes of Items in the Data Base 
Number of Input Message Types 
Number of Output Message Types 
Number of Input Variables 
Number of Output Variables 

Number of Words In Tables, and Constants not in Data Base 
Percent Clerical Instructions 
Percent Mathematical Instructions 



Percent ! nput/Output- I nstructions 

Percent Logical Control Instructions 

Percent Se1f*Checkfng Instructions 

Percent Information Storage and Retrieval Functions 

Percent Data Acquisition and Display Function 

Percent Control or Regulation Function 

Percent Decision-Making Functions 

Percent Transformation Functions 

Percent Generation Functions 

Average Operate Time 

Frequency of Operation 

Insufficient Memory 

Insufficient I/O Capacity 

Stringent Timing Requirements 

Number of Subprograms 

Programming Language 

POL Expansion Ratio 

Support Program Availability 

Interna) Documentation 

External Documentation 

Total Number of Document Types 

Total Number of External Document Types Written During a Programming Step 

Total Number of Internal Document Types Avai lab I e. From Previous Step 

Total Number of Internal Document Types Written During a Programming Step 

Type of Program 
Business 
Scientific 
Utility 
Other 

Compiler or Assembler Used 

Developmental Conqputer Used 

First Program on Computer 

Average Turnaround Time 

ADP Components Developed Concurrently 

Special Display Equipment 

Core Capacity 

Random Access Device Used 

Number of Bits per Word 

Memory Access Time 

Machine Add Time 

Computer Cost 

Percent Senibr.Programmers 

Average Programmer Experience with Language 

Average Programmer Experience with Application 

Percent Programmers Participating in Program Design 

Personnel Continuity 

Maximum Number of Programt^rs 

Lack of Management Procedures 

Number of Agencies Concurring In Design 

Customer Inexperience 



1-7 



Computer Operated by Agency Other than Program Developer 
Program Developed at Site other than the Operational Installation 
Different Computers for Programming and Operation 
Closed or Open Shop Operation 

Number of Locations for Program Data Point Development 
Number of Man Trips 

Program Data Point Developed by Military Organization 

Program Data Point Developed on Time-Shared Computer 

Complexity of System Interface with Other Systems 

Security Classification Level 

Number of Sources of System information 

Accessibility of System Information 

Degree of System Change Expected During Development 

Degree of System Change Expected During System Operations 

Number of Functions in the System 

Number of System Components 

Number of System Components — Hot Off-the-Shelf 

Percent Senior Analysts 

Quality of Resource Documents 

The Availability of Special Tools 

Degree of Standardization in Policy and Procedures 

Nun^er of Official Reviews of Documents 

Personnel Turnover * 

Output Volume 
input Volume 



% mn u k rww*’ 



ik 



ERIC 

,ySB9S!aSlS3 



I iuuiiiP«4tvm^fi^n9vipff]|ipnni9i^ 






rr^rn mmjit. 



AtP DATA ELEMENT DESCRIPTION SHEET 



Page 1 



of 1 



APPENDIX II 




(CURRENT) 


TYPE: 


ENTRY : 










BASIC 0 


NEW 


0 








COMPOSITE O 


CHANGE 


n 


ELEMENT NAME: 


NUMBER OF MAN HOURS 






ADD. 


n 










DEL. 


n 



DATA ELEMENT REFS: AlP / E/ E/ Q/ JL ° A JL -L 

MNEMO^HC ABBREV* 
QUALIFIERS 



OTHER REFS: 



DEFINITION: Amount of staff time per man required to perform a given job or task, expressed tn hour units. 

Range»0 to n hour units 



SYNONYMS: 



Man hour cons um ption 



URmeT 



ABBREV. 



CODES: 



EXPLANATION: 



DATA ITEMS: (0 

( 2 ) 



Sample! 5 hours (3l 



(3) 

(M 

(5) 



IDENTIFIER: 



DATA USE: 



0 ) 

( 2 ) 

(3) 

(5) 



Programmlfifl 



Joh/Task Analysts 



(CONTINUED I J) 



DATA GROUP: 



Jiib_Jrijae^R equlrementa_f n 



OTHER SYSTEMS (NOTES) , 
NOTES (source. REMARKS) 



CLASSIFICATION: / 1 / 1/ 2/ 

CSOE 



IMJJ 

ARRAY 



PREPARED BY 
REVIEWED BY 



ARB 



ARB 



DATE 30 Jan. *68 
DATE I Feb. ‘68 












T2 



i J 



I ^ 
M 






^ I 



i f. 



i I 



? ^ 


















twwmm 



wrrtr7<msrvf?vf!ffm 



njwjvwyjnw 









APPENDIX III 



DATA ELEMENT DESCRIPTION 



Page 1 Of 1 



DATA ELEMENT NAME: FORENAME 



DATA ELEMENT 

REFERENCES: AIP: ■ FORENM 

OTHER: 



TYPE: 



Basic 



CODE: 



Composite I \ 



SYNONYMS : First Name, Given Name^ Christian Name 

CLASSIFICATION 









SYSTEMS 


o 




DATA GROUP 
IDENTIFIER: 


Name (1) 


REFERENCE: 




1. Components 
Hardware 


O 






















Software 


Hi 














DEFINITION: 


Desifisnation that differentiates one individual 


from other 




2. Aspects 
Subject 


o 




members of his family; the first name, a distinctive designs- 










tion or aooelation which precedes all other words that com- 




Physicists 


o 


DATA ITEMS: 


prise the full name. 

Sample: John in the grouping "Jchn 0. Smith" 


CODES: 




Users 


o 




(3) 








o 










Institutions 








FILES 


CD 





AUXILIARY CODES 



DATA USE: 



Full alphabetic or initials: 



DATA USE REFERENCE: 



used in Data Grouping ”Author” 



TAG DaODOSD 
ARRAY mC23 



Prepared By ARB 
Reviewed By 



Date 19 Feb. '68 



Date 20 Feb. *68 









■am 


















IV-I 



APPENDIX IV 



Structuring Display of Bibliographic Data Elements 



Structured 



Data Grouping 
Level i Level 2 



Level 3 



Data Element Sample Data Items 
Level ^ Level 5 Level 6 



Entries 



Main 



Author 



Personal 

Author 



Surname, 



Smith 

Jones 



Middle Name 
or Initial (s). Quincy 

Q. 



Forename or 
Initial (s), 



John 

Albert 

J. 



Data Element 



Data Items 



Level 1 Level 2 Level 3 Level 4 Level 5 



Level 6 Level 7 



Corporate 



Name of 
Organization 






Main Body U.S. House of 

Representatives 



Subdivision (s) .. .Committee on 

Science and i 
Astronautics \ 



Sub-Subdivision(s) . .Subcommit- 
tee on 
Science 1 
Research j 
and Devei-' 
opment i « 



J 



c 

- 






r<.T '*• , 1 1 M T 1, I -• m vnnnvT 



1 . 



V-1 



o 

ERIC 



APPENDIX V 



OCCURENCE OF COMMON BIBLIOGRAPHIC 
DATA ELEMENTS 



BIBLIOGRAPHIC ENTRY (DATA ITEM) 



Journal Article 



Jones , John Q. 

Physics Laboratory, Ohio 
State University, Columbus, Ohio 
Conversion of Light to Ultrasonic 
Energy 

Annals of Physics 

Vol. 26, No. 3) March I 967 

pp. 369-374 



Book 

Smith, Harry J. 

Department of Theoretical Physics 
Rockefeller Univ. , New York, N.Y, 
Atomic Collision Processes 
McGraw-Hill, New York, N.Y. 

1968 



DATA ELEMENT 



DATA GROUPING 
OR USE IDENTIFIER 



1 . Surname 

2. Middle Name 

or Initial 
3* Forename 

4. Name of Organization 

a. smallest unit 

b. largest unit of 
organ! zation 

5* Location of Organization 

6. Title of Article 

7 . a. Name of Journal 

b. CODEN (Journal code) 

8. Volume Number 
9* Issue Number 

10. Month 

11. Year 

12. Page number 

a. First 

b. Last 



A. Name of Author 



B. Af f i 1 iation 
of Author 



C. Date of 
Pubt icat ion 






1. 

2 . 

3. 

13 . Title of Book 

14. Name of Publisher 
15* Place of Publication 

n. 



A. Name of Author 



C. Date of 
Publ ication 



T" 






V-2 



BIBLieHWftPttlC ENTRY 



Patent 
Brtwn» B» B. 

American Institute of Physics 
Photoe^Ioctrlc scann^Ing device 
Cl. 250-239 I Jan* I 96 Q 
Filed 31 Dec. I 967 
9»999»999 



■ OT ti Efr EN TRUS 



SubscriTytton Records 



Society Ttecords 



Oral History Interviews 



DATA Ei'EMENT 



DATA GROUPING 
OR-WSE IDENTIFIER 



1. 

2. (Initial only) 

3. (Initial only) 

4. (Largest unit only) 

16. Patent Title 

17 . Patent Classification No. 

18. Day 

10 . 

II. 



D. Name of 
Inventor 

F. Name of 
Assignee 
C. Date of 
I ssuance 



19 -. Country of Issuance 
20. Patent Number 



1. 

2 . 

3. 

4. 

21. Address 



1. 

2 . 

3. 

21 . 

4. 

5. 



21 . 



1. 

2 . 

3. 

4. 

5. 

6 . 

13. 




] 

1 

1 

1 



G. Neme of- 
Subscriber 

H-. Neme of 
' Subscribing 
Organ! tat ion 

I. Address of 
Subscribing^ 
Organitation 

J. Name of 
Society' Member 

K. Address of 
Society Member 

L. Name and 
Location of 
Organ'itat'ional 
Member of’ 
Society 

M. Address of 
Organ! tat ional 
Member of 
Society 

N. Name of 
Inte.ryi^er 

0^ Organitatl-onal 
Affiliations 
of Interviewee 

P. Publications of 
Interviewee 






V“3 



OTHER ENTRIES 


DATA ELEMENT 


DATA GROUPING OR 
USE 1 DENT IF 1 ER~ 


National Marrpower Register 


1. 


\ 0- Name of 




z . 

3. 


J Registrant 




21. 


R'. Address of 




4. 


Reg*Istrant 
S. Aff I nations of 




5. 


Registrant 



TABLLr APPLICAT ION -OF STRUCTUftCB.. STANDARD. DATA aCHEHTS 



Vm£R£^OSED data STRUCTWNG Of ENTRIES 

AlP Primary Journals A(l,2,3)*B<4,5))6,7(a,b),9*3*C(10,ll),l2 

For Book reference citations, also 13,1^*15 



Secondary Journals (e.g. Physics Abstracts) 

Special BlbMographles. and Critical Revlev/s 

Subscription Fulfillment 

AlP Societies Membership Records 

National Register of Physicists 

Center for the. History and Pbllosophy 
of Physics 

Oral History Collection 
Neils Bohr Library 



A(1.2,3).B(k.5).6.7.8,9.C(10.11).l2.13. 

I4.15.l6.17.l8il9.20 

A(l,2,3),6,7,8,9,C(10in),12,13,H,15 

G(I,2,3),H(4,5),I(2I) 

J(1,2,3),K(2I),L(')>5)>H(21) 

a(),2,3),S(4,5) 



N<l,2,3),0(4,5),f’(6,l3,l6) 

A(l, 2, 3), 7, 8, 9, C (18,10,11), 12, 13,14,15, 
16,17,19,20 



Legend: The key Is composed- of a Data- GroupTng..or Use. Identifier 

(Alpha) plus Data' Element (Number I c) , e.g. Name of Author 
(A) « Surname' f*!) +' Middle Name or Initial (2) + Forename 
(3) ^ A(1*2,3). The name of an AlP society member Is 
composed.of. the- samerdata e-iements « J(1,2,3) 






wmmmm 













v-4 



DATA ELEHEHT 



OTHER ENTRIES (cont.) 



1. 

2 . 

3. 

21 . 

k. 

5. 



DATA GROUPING OR 

USE IDENTIFIER 
— 



Q. . Name of Registrant 

R. Address of 
Registrant 

S. Aff i i iations of 
Registrant 

j 



Table; APPLICATION OF STRUCTURED STANDARD DATA ELEMENTS 



WHERE USED 

AtP Primary Journais 



Subscription Fulflilment 

Center fcr the History and Philosophy 
of Physics 

Secondary Journals (e.g. Physics 
Abstracts) 

AlP Societies Membership Records 
Orai History Collection 



DATA STRUCTURING OF ENTRIES 

A(i,2,3),B(4,5) ,6,7,a,b,8,9,C(IO,ll), 12 For book 
reference citations, also l3*t^*tS 

G(I,2,3),H(A,5),I(2I) 



A(I,2,3),B(4,5),6,7,8,9,C(I0,II),I2,I3,U,I5,I6, 

17 , 18 , 19,20 

J(1,2,3),K(2I),L(4,5),M(2I) 

N(l,2,3), 0(4,5), P(6,I3, 16) 



Special Bibliographies and Criticai A(l ,2,3),6,7,8,9,C(I0, I i) , 12, i3, 14, 15 
Reviews 



National Register of Physicists 
Niels Bohr Library 



a(l,2,3),S(4,5) 

AO,2,3),7,8,9,C(l8,lO,ii),l2,i3,l4,i5,l6,l7,l9,20 



Legend: The key is composed of a Data Grouping or use Identifier (Alpha) 

plus Data Element (Numeric), e.g. Name of Author (A)« Surname 
(i) * Middie Name or initial (2) * Forename (3) » A(l,2,3)* The 
name of an AlP society member Is composed of the same data elements* 
a(i,2,3) 






Vl-I 



Appendix Vt 

STANDARDIZATION OF DATA ELEMENTS 

A more detailed examination of standardization efforts in the area 
of data elements, their codes and. formats required for information inter- 
change may be helpful in understanding the background of this report. Two 
aspects of the problem are considered.- The first addresses itself to 
standardization generally and to data element standardization on the whole. 

The second looks at the recent -history. of data element standardization as 
it relates to the AlP work. 

Standardization 

The following definition .of standardization has been offered by 
the Standing C<xnmlttee for the Study of Scientific Principles of Standardization 
(STACO) of the International Organization for Standardization (ISO), and adopted 
by the ISO Council in 1962: 

"Standardization is the process of formulating and applying rules 
for an orderly approach to a specific activlty-for the benefit 
and with the cooperation of all. concerned, iind ln particular, for 
the promotion of optimum overall economy taking due account of 
functional conditions and safety requirements. 

It is based on the consolidated results of science, technique 
and experiences. It dett^rmlnes not only the basis for the 
preseut out also for- future development, and it should keep 
pace with progress." 

Dr. N.A.J. Voorhoeve in his paper '! In ter national DocumedtetlonU.n' the Domain 

of Standardization" points out a further specification by STACO: 

"A standard is the result of a particular standardization effort, 
approved by a recognized authority. It may take the form of a 
document containing a set. of .condl tlons to be fulfilled..." 

On this basis Dr. Voorhoeve draws the conclusion that "standardization of 

documentation, to mention only one example, is certainly included in STACO's 

definition." 

We have seen above how the set of conditions to be fulfilled by data 
element standardization applies to the identification of the names, meanings 
and relationshipr. >f certain concepts of groupings of data items that are 
interchanged and communicated between systems. Certainly the documentation 
required to record what was identified and compiled is standardization 



i 



Vl-2 

in Dr. Voorhoeve's sense. But this describes only one instance of the 
standardization process. Actually, at least four levels of data element 
standardization with different topical considerations can be identified. 

The first applies to basic agreement abot.t what constitutes the general 
data element structure and requires. the formulation of relevant definitions. 
The second encompasses common ..agreements among the different users of 
specific data elements with regard to the meanings and ways of representing 
the meanings of these elements. Agreement may extend beyond the elem^ots 
to the data items on the one hand, and/or proceed In a different direction 
toward the definition, form and users of the elements th^selves. This 
process must start at a local data system level. It may then rise to more 
general, perhaps national and international levels of agreement. The third 
area of standardization includes the whole thorny range of coding and format* 
ting problems: standardization of character types, character set extensions, 

control characters, modes of representation (binary, octal, decimal), mes- 
sage and field formatting and size, preferred media for Interchange (tapes, 
cards...), common codes for data elements, data Items, etc. The fourth and 
final area Is perhaps the most difficult; The standardization of standar- 
dization practices treats the question, how does one go about getting other 
people to agree on things to be agreed on? How does the realization that 
standardization is needed ever begin? Synonymy is one way. Ambiguity of data 
terms is another. 

Recognition of the need for a common and economical language arose in 
the Department of Defense (DoD) .with the development of high speed digital 
date transmission vs terns. As the computer systems have become bigger and 
faster and centrally control led. dec is Ion making increasingly important, the 
obstacles to systems integration due to linguistic factors became ever more 
apparent. The same datum could be established as an element in several 
data systems, with a different name or identification (synojiomy), a partly 
or totally different meaning and, almost invariably, a different code, either 
in structure, size or both. 

The National Military Command System, a group planned at the highest 
operational level, entered its implementation phase by the middle of 1962, 
announcing at that time a program to standardize the data elements and 



O 

ERIC 






VI-3 

codes feeding from one data ^system-or^ level to another. • The hardware 
capability to establish and maintain the multipurpose data files and 
Integrated systems necessai^, for centralized management was simultaneously 
^ made available. Thus a hardware environment was provided which was to be 

dependent on standard data elements and codes. 

‘ Historical survey 

The need to facilitate data Interchange and systems Integration required 
for high-speed data transmission led to a determined effort to standardized 
the data elements used within the OoD. 

OoD data standardization became operational with the establishment of 
the DoD data standards organization In the Office of the Assistant Secretary 
of Defense (Comptroller) on June 10, 1964, and that of the Data Standards 
Division In 1964. 

Irifonned guesses have estimated that the number. of data elements in 
DoD data systems total upwards of 200,000. DoD has already standard I zed a 
a number of fl>eld$, Including geographical areas around the world, and the 
states of the United States of America. It is currently working upon the 
Military Standard Contract Administration Procedures (MIISCAP) -data processing 
system (Project 60), which will be based on a fully standardized -data element 
vocabulary. 

The DoD has developed the largest on-goIng data -standardization opera- 
tion • However, DoD Is only one of the Federal Agencies being coordinated 
in the Bureau of the Budget (BoB) effort to integrate the data systems used 
throughout the entire Federal Government. The BoB has recently issued a 
circular (BoB Circular, A86).whlch defines specific policies and responsibi- 
lities, together with procedures by which the recommendations of its Task 
. Forces will be developed and adopted as Federal standards for. Data Elements 

and Codes. At present, there are seven Task Forces coverliig. business, indlvl- 
. dual, time, government agency, state and country and place codes as well as 

countries of the world. 

Technical advice and the maintenance of Federal registers required 
for this centralized program will be provided by the National Bureau of 
Standards. 



m «i w i< ipqf ftp 



vmvrn 



VI-4 

A number of professional and industrial .organizations have been 
concerned with generai standardization endeavors to, promote data inter- 
change capabi iities. Among tbese:.are; the Worid Meteorolcgicai Organizatior^ 
the Air Transport Association and a number of internationai and nationai 
voiuntary standards organizations. 

The United States of America, Standards Institute, (USASI) is the 
principie organization for United States data eietnent standardization work 
at both the nationai and internationai ieveis. At the nationai ievei stan- 
dardization efforts are under. the, cognizance of the Uni ted. States of America 
standards Instutute (USAS!) Subcommittee X3*8 Data -Elements, Codes and forma ts» 
organized in i966 and current iy working under the chai rmanshipof Mr. David 
V. Savidge of UN I VAC and the vice-chairmanship of Mr. Harry S. White, Jr of the 
Nationai Bureau of Standards. The mission of Subcommittee X3*8 is to deveiop 
standards and reiated understandings in the area to facilitate information 
interchange. The work wl i i attempt to deveiop, in addition, a standard 
method of describing and designating data formats for -data interchange. 

Detai ied work is bandied by, six task, groups, administrative and speciai 
tasks by two ad hoc groups and a steering committee. The task groups are as 
foi iows: 

X3.8. i Standardization Criteria : 

This committee is responsibie for definitions, , criteria, methodoiogy, 
giossary. 

X^,8.2 Time Designations : 

This work area inciudes both macroscopic and microscopic time periods,, 
now appears that the first standard proposai wiii be for caiendar dates 
in data systems and wiii recommend the order YEAR-MONTH- DAY. 

X3^8.3 Individuais and Organizations : 

This work is organized into two areas— 

a) Personal Identifiers - One of the proposals under consideration 
is to use the Sociai Security Number as the identifier of individu- 
als. However there are existing iegai restrictions that must be 
ciarified or changed before this proposai can be adopted. A 
further need is verification of number/name combinations. The 
major questions of cost, organization, and feasibiiity of a 
nationai verification system are now under study. 









/ 




VI-5 



A second study project covers the procedure for representing 
Individual names and the use of extraction systems such as Soundex 
Code. 

b) Organization This study area covers Identifiers for organiza- 
tions, both governmental and private. Severe questions exist, for 
example, as to the problem of widely diversified branches or 
divisions of large organizations such as.muttlrdivlsion firms 
holding companies, school systems, and the tike. 

X3.8.4 Geographic Units 

This Working Group Is maintaining close coordination with a sl^iv- 
I tar Task Force of the Budget study (described below) which is studying 
geographic units. The Committee wU t .prlmari ty took for and deal with 



situations which wilt be important to private Industry and state and local 
governments. 

X3*8.5 Data Structures : 

This work area includes arrangemnt of data Into. formats and the 
necessary syntactical rules necessary to separate elements of data. 

X3*8.6 Quantitative Values in Data Systems : 

This work includes: 



characters can be added to a base number or code so that at given points In 
the processing It wit t be possible to check whether the number Is correct. 
This check character Is determined by mathematical formulas using the 
characters In the base or code* 



a) The problem of specifying quantitative data, 

b) Error detection , and correction (self- checking) codes. Check 






4- > * 4 









F- 



( 1 ) 



( 2 ) 

(3) 

(A) 



(5) 



( 6 ) 

(7) 

( 8 ) 

(9) 

( 10 ) 

11 ) 



HfS/i 






REFERENCES .! 

] 

McGee, Willian) C., The Formulation of Data Processing Problems for Computers, 
in Advances in Computers, edited by Franz L. Alt and Morris Rubinoff, Vol . A, 
Academic Press, New York, 1963, pp 1 - 52, esp. pp 38 - A9 ■ 

Evans, O.Y., Advanced Analysis Method for Integrated Electronic Data Processing, I 
General Information Manual F 20 - 8 OA 7 , International Business Machines Corp., | 
New York, N.Y. (no date) 1 

cf. IBM Systems Reference Library^ IBM System/360, PL/1 Reference Manual, File | 
No. S 360 - 29, IBM, New York, N.Y., I 967 

Machine Recording of Textual Information during the Publication of Scientific | 
Journals, Report on Work Done on National Science Foundation Contract 305, I 

during Period January 1, 1963, to May 30, 1965, Prepared by Lawrence F. Buckland, I 
May 30 , 1965 , Inforonics Inc., Maynard, Massachusetts, 1965 j 

Curran, Ann T. and Henrietta D. Avram, The IJentif ication of Data Elements in I 

Bibliographic Records, Final Report of thespecial Project on Data Elements for ] 

the Subcommittee on Machine Input Records (SC-2) of the Sectional Committee on 
Library Work and Documentation (Z-39) of the United States of America Standards | 
Institute New York, N.Y., May I 967 I 



Libbey, Miles A. and Gerald Zaltman, The Role and Distribution of Written 
Informal Communication Theoretical High Energy Physics. New York, N.Y.,.. 
American Institute of Physics, August 1967* Available from AlP as Report No. 
AIP/SDD-1 (rev.), also USAEC Report No. NYO-3732-1 (REV.) 

Toward a Disciplined Data Systems Language, in Data Processing Yearbook for 
1966, pp 107-116 



United States of America Standards Institute, Subcommittee 2 39 SC 2 - Machine 
Input Records, Proposed U.S., Standard for a Format for the Communication of \ 

Bibliographic Information in Digital Form, Z 39 SC 2 (1968(2)), New York, N.Y., ? 

1968 ' j 

I 

Licklider, J.C.R., et al., '-‘Report of the Office of Science and Technology j 

Ad Hoc Panel on Scientific and Technical Communications", Washington, D.C., j 

Office of Science and Technology, 8 February 1965 I 

Executive Office of the President, Office of Science and Technology, Task ^ 

Group for Interchange of Scientific and Technical Information in Machine 
Language (ISTIM), Reporting , to the Executive Office, Final Report, Washington, ^ 
O.C., April 3, 1968 

Avram, Henrietta D., John F. Knapp, and Lucia Rather, The MARC II Format, 
Information Systems Office, Library of Congress, Washington, D.C., January 

1968 , j 



